Also, for the developer types: they offer a way for you to interact with the Aquaria software to add your own features of interest with their API. Maybe you have new mutations you have found in some sequence you’ve obtained in your lab, for example. They are offering guidance on that here: They touch on this in the longer video (~27min) if you want a bit more explanation. I suspect from the high quality support they are offering, they’d be interested to hear from you and what features you’d like to see applied to these proteins as well.

So kudos to this team for a nifty tool and really serious multi-media outreach efforts. I think it was well done on all counts. I’ll bet you Reddit reached more of the right folks than a press release ever will. PIOs take note–get your scientists on Reddit.

OpenHelix will be exhibiting at the International Molecular Medicine Tri-Conference (MMTC or TRICON) at booth 129, February 16-18. While onsite at the Tri-Conference, we invite you to demo OpenHelix and not only learn how to enable more effective research, but receive a Starbucks gift card for completing the demo.

The 22nd International Molecular Medicine Tri-Conference is the industry’s Preeminent Event on Molecular Medicine, focusing on Drug Discovery, Genomics, Diagnostics and Information Technology. Spanning six days this year, the Tri-Conference includes an expanded program that includes 6 symposia, over 20 short courses, and 17 conference programs.

OpenHelix provides over 100 tutorial suites on popular and powerful bioinformatics and genomics tools. Each tutorial suites includes 30-60 minute tutorials highlight and explain the features and functionality needed to start using a resource effectively. The tutorial suites also include PowerPoint slides, handouts and exercises to save time and money in teaching others.

An OpenHelix subscription to the tutorials enables quicker and more effective research at your institution through more efficient use of the publicly available tools to access biological data. Join some of the best universities, research institutions, and biotech companies in training scientists on how to use these critical tools.

To schedule your 5 minute demonstration of the OpenHelix site and tutorial suites (and receive a Starbuck’s gift card), email Scott Lathe or call (425) 442-0322. We will be at booth 129 in the Tri-Conference Exhibit Hall.

This week’s tip of the week is on Gemini which is the acronym for “GENome MINing.” Unlike most of the tips we give every week, this one is a software package. But, it is does use and integrate with many internet databases such as dbSNP, ENCODE, UCSC, ClinVar and KEGG. It’s also a freely available, open source tool and quite a useful software package that gives the researcher the ability to create quite complex queries based on genotypes, inheritance patterns, etc.  The above 12 minute clip is a talk given at a conference that gives a introduction of the science behind the tool.

Modern DNA sequencing technologies enable geneticists to rapidly identify genetic variation among many human genomes. However, isolating the minority of variants underlying disease remains an important, yet formidable challenge for medical genetics. We have developed GEMINI (GEnome MINIng), a flexible software package for exploring all forms of human genetic variation. Unlike existing tools, GEMINI integrates genetic variation with a diverse and adaptable set of genome annotations (e.g., dbSNP, ENCODE, UCSC, ClinVar, KEGG) into a unified database to facilitate interpretation and data exploration. Whereas other methods provide an inflexible set of variant filters or prioritization methods, GEMINI allows researchers to compose complex queries based on sample genotypes, inheritance patterns, and both pre-installed and custom genome annotations. GEMINI also provides methods for ad hoc queries and data exploration, a simple programming interface for custom analyses that leverage the underlying database, and both command line and graphical tools for common analyses. We demonstrate GEMINI’s utility for exploring variation in personal genomes and family based genetic studies, and illustrate its ability to scale to studies involving thousands of human samples. GEMINI is designed for reproducibility and flexibility and our goal is to provide researchers with a standard framework for medical genomics.


This week’s highlighted discussion tackles the topic of small projects for folks who are just beginning their training in bioinformatics, or possibly a career transition into a new area. It’s an issue that has come up a number of times, and this new idea for connecting students and projects is a good one, I think.

Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.

I’m a Computer Scientist/experienced developer looking to get into the field, and contributing to Open Source seems to be one of the first suggestions people make for starting out in bioinformatics.

I was wondering if anyone had any recommendations for open source software projects worth contributing to, particularly ones that might have some low hanging fruit or are in real need of help. Is there any tools you folk are using right now that really needs feature X or are you a project maintainer that needs a dig out? The difficulty I’m having is that because I’m not working with these tools day to day, I don’t have the best view of the commonly used tools and their associated problems.

So if anyone has any suggestions I’m going to try fit in some OSS contributions with my own contracting jobs/spare time bio studying. My programming background is Java, Javascript, Python, PHP and I’m learning some R at the moment while I do some coursera specialisations. I’ve done quite a bit of systems admin if the work involved server-side clustering/distributed systems etc.

The answer with the idea for the “Pick me up!” tag struck me as a good system for this sort of thing. Maybe others could implement this kind of tag on their projects too, if they have suitable small tasks. So I thought I’d raise the awareness of that a little bit–in case someone comes to us on a search for “small academic projects in bioinformatics” again. I hope they find some. I still think it’s a need on both sides.

This week’s Video Tip of the Week is Aquaria, a new resource for exploring protein structures, mutations, and similarities to other proteins. It’s a very well-designed and interactive experience for end users. It is aimed largely at biologists who could benefit from exploring the structural details of their proteins of interest, but are daunted by tools aimed at structural biologists. But for tool developers, you should also look at how this rollout went. It’s one of the best examples of a tool launch I’ve seen in this field. And I’ve seen a lot.

So first, the tool. Aquaria offers users a streamlined way to access and explore protein structures. Combining the kinds of information you get from the PDB structure resources, and additional details like the UniProt mutations. Currently you start with a basic search by asking for a protein by name, or PDB or UniProt ID. They have pre-calculated the relationships of proteins in PDB and Swiss-Prot to quickly offer you a structure and related proteins. The paper notes: “Currently, Aquaria contains 46 million precalculated sequence-to-structure alignments, resulting in at least one matching structure for 87% of Swiss-Prot proteins and a median of 35 structures per protein….” In addition, it lets you explore other important biological features such as InterPro domains, post-translational modifications, so you can think about how the mutations + structures + functions impact a given protein that you are interested in. As they describe it:

“We have loaded SNP data from Uniprot and Interpro so you can see where the mutations lie on your 3D model. And we have found that you may be pleasantly surprised to find your mutations clustering in 3D space!”

Another handy feature they provided is a Quick Reference Card with shortcuts to the functions [PDF]. In addition to this intro, they have a longer video as well. This is more like a typical lecture with the background, the framework, the goals of the project, and more about the underlying database.

Now, this thing about the rollout of this software project. I found it when I was looking over the talks at the upcoming VIZBI conference (Visualizing Biological Data). Every year I find there are awesome ideas that come out of VIZBI, and tools I want to explore. Among them this year is Aquaria. So I went looking for more detail, and found some of the traditional stuff. The paper (below), the press release, etc. And then I found the Reddit discussion. The Aquaria team did a Science AMA on this tool. It engaged a range of folks–some folks just fans of science who had probably never seen protein structures before. That’s fine with me–the more folks who appreciate research and learn about how researchers explore proteins is a good thing. But others had good technical questions for the team–such as other ways to find proteins of interest with sequence searches, or integration with other tools like UCSC Genome Browser. All the answers are over there. I enjoyed the question about the name of the tool:

It seems you get the ideas we had in mind: using Aquaria lets us observe these fascinating creatures (proteins) from the natural world. Aquaria creates an artificial environment and lighting where we can observe isolated proteins; like aquarium fish, proteins are often beautiful and (usually) live in water.

I asked them about how this played out, and they had ~1000 folks visit their site as a result of this Reddit event. That was really interesting to me, and a very neat route to drive awareness.

As often happens, last week’s tip on visualizing structures led me to some more reading and thinking about creating protein structures. And although it’s important for biologists to be able to use more of the information about protein structures and variations in their work from tools like Aquaria or PDB, it’s also important for some researchers to be on the other end of the pipeline and actually making the protein structures. Further, this also leads to the possibility of better designs of novel proteins as therapeutics–for example, making antibodies like the ones that could possibly battle Ebola.

As I looked around for protein design software to highlight for a tip, it was clear to me that the level of complexity of the problems in designing proteins didn’t really lend itself to short videos. There are some introductory seminars and tutorials on the Rosetta tools, but these certainly require a bit of time to explore. Instead, I’ve decided to highlight this really nice overview on the aspects of protein design that you would have to tackle to make customized proteins.

This iBiology “Introduction to Protein Design” by David Baker is really well done. There’s also a second seminar that is more detailed about designing proteins with new functions to solve many problems in biomedical research and environmental challenges.

This seems incredibly important and useful–but certainly daunting to get started. One way to get a head start on this would be to take an intro workshop. I was recently notified about the opportunity to learn from a couple of researchers who are very skilled with the Rosetta tools–Daisuke Kuroda and Jared Adolf-Bryfogle.

I’m including in the references a nice review of the basics of computational design of antibodies by Kuroda et al. And also a paper by Adolf-Bryfogle and team that covers important aspects of the component parts of antibodies that you would need to predict structures and design new ones, which are stored in the database they’ve created. This should give you a sense of the challenges and opportunities. And give you a good foundation for the concepts.

Rosetta software has been a powerhouse of protein design for many years. It’s been a leader in the CASP competitions (Critical Assessment of protein Structure Prediction). It’s got a strong user community: Rosetta Commons. You can obtain and use the software in a variety of ways, including some servers for academic use, and one important stop would be the ROSIE servers, “The Rosetta Online Server that Includes Everyone hosts several servers for combined computer power as a free resource for academic users.”

Quick update: the recent webinar we delivered, “World Tour of Genomics Resources II”, is now available as a downloadable recording. Access it here. There’s a short video preview there, but the whole thing is about an hour long.

If you want the slides and the handout with the list of resources, those are available in our previous post:  World Tour of Genomics Resources II, webinar follow-up post. We are going to convert this into a regular tutorial suite with a professional recording soon, and it will be available in our catalog then.

This week’s tip isn’t about a specific tool–but a really interesting look at how a tool was used in the context of some general public outreach messaging. Recently I posted about Aquaria, a new tool available to let biologists explore protein structures, mutations, and domains in user-friendly ways. But an interesting example of how the information about protein structures can be used to drive understanding came from a video animation of protein accumulation in Alzheimer’s. Just have a look at the video first and enjoy it. How cool is that clathrin basket pulling the vesicle in?

found out about it as I was looking at the upcoming VIZBI talks and exploring their site for other features. In the VizbiPlus section there are a number of excellent animations of molecular processes, and this video was one of them. Be sure to watch for other tweets with the #vizbi hashtag for the next few days. I bet you’ll see some amazing tools and visualizations, as always.

Recently I mentioned the longer, more comprehensive, video from the Aquaria team, but I didn’t use that for my tip–I just used the short version overview. But the longer version had this extra bonus piece of how their software had been used by this animator. Here is Christopher Hammang, creator of this video, describing how he used the Aquaria information to generate the structural model for his animation:

Often it helps people to see how someone else used a tool for a project to get a better grasp of it. And this seemed like such a compelling and unusual example, I wanted to highlight it.

So again I’ll point you to the Aquaria tool tip from earlier this month to explore more, now with an understanding of an example of its use. But I would also encourage you to have a look at the other animations coming out of VIZBI at the VizbiPlus page. I swear, the animated intestine is way cooler than you might expect. The diabetes + insulin receptor videos are really informative and helpful. A cancer video illustrates a misbehaving p53.  Go look.

Community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.

Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the world.

In fact many people around the world are working in this domain. some studied bioinformatics and some not (even I see physician are doing bioinformatics). I have been reading papers from all known journals which publish biology related bioinformatics papers or pure bioinformatics. I can tell , pretty much around a topic all times.  I know it is very general question and we cannot give a great and direct answer to it. However, I would like to know which topics you think are the hot spot these days for bioinformatics?

for example, many people are doing sequencing ( of course we cannot have a golden standard because “all modelling are wrong but some are useful “) so these types of studies are going to be forever?

We all know that bioinformatics is only a tool and not the pure science itself. so can we think that it is a died field since mathematics/statistics found itself already or so much left to do ? if so much left to do, what could be those topics ?

I am so eager to know about your opinion