This week’s highlighted discussion tackles the topic of small projects for folks who are just beginning their training in bioinformatics, or possibly a career transition into a new area. It’s an issue that has come up a number of times, and this new idea for connecting students and projects is a good one, I think.

Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.

I’m a Computer Scientist/experienced developer looking to get into the field, and contributing to Open Source seems to be one of the first suggestions people make for starting out in bioinformatics.

I was wondering if anyone had any recommendations for open source software projects worth contributing to, particularly ones that might have some low hanging fruit or are in real need of help. Is there any tools you folk are using right now that really needs feature X or are you a project maintainer that needs a dig out? The difficulty I’m having is that because I’m not working with these tools day to day, I don’t have the best view of the commonly used tools and their associated problems.

So if anyone has any suggestions I’m going to try fit in some OSS contributions with my own contracting jobs/spare time bio studying. My programming background is Java, Javascript, Python, PHP and I’m learning some R at the moment while I do some coursera specialisations. I’ve done quite a bit of systems admin if the work involved server-side clustering/distributed systems etc.

The answer with the idea for the “Pick me up!” tag struck me as a good system for this sort of thing. Maybe others could implement this kind of tag on their projects too, if they have suitable small tasks. So I thought I’d raise the awareness of that a little bit–in case someone comes to us on a search for “small academic projects in bioinformatics” again. I hope they find some. I still think it’s a need on both sides.

This week’s Video Tip of the Week is Aquaria, a new resource for exploring protein structures, mutations, and similarities to other proteins. It’s a very well-designed and interactive experience for end users. It is aimed largely at biologists who could benefit from exploring the structural details of their proteins of interest, but are daunted by tools aimed at structural biologists. But for tool developers, you should also look at how this rollout went. It’s one of the best examples of a tool launch I’ve seen in this field. And I’ve seen a lot.

So first, the tool. Aquaria offers users a streamlined way to access and explore protein structures. Combining the kinds of information you get from the PDB structure resources, and additional details like the UniProt mutations. Currently you start with a basic search by asking for a protein by name, or PDB or UniProt ID. They have pre-calculated the relationships of proteins in PDB and Swiss-Prot to quickly offer you a structure and related proteins. The paper notes: “Currently, Aquaria contains 46 million precalculated sequence-to-structure alignments, resulting in at least one matching structure for 87% of Swiss-Prot proteins and a median of 35 structures per protein….” In addition, it lets you explore other important biological features such as InterPro domains, post-translational modifications, so you can think about how the mutations + structures + functions impact a given protein that you are interested in. As they describe it:

“We have loaded SNP data from Uniprot and Interpro so you can see where the mutations lie on your 3D model. And we have found that you may be pleasantly surprised to find your mutations clustering in 3D space!”

Another handy feature they provided is a Quick Reference Card with shortcuts to the functions [PDF]. In addition to this intro, they have a longer video as well. This is more like a typical lecture with the background, the framework, the goals of the project, and more about the underlying database.

Now, this thing about the rollout of this software project. I found it when I was looking over the talks at the upcoming VIZBI conference (Visualizing Biological Data). Every year I find there are awesome ideas that come out of VIZBI, and tools I want to explore. Among them this year is Aquaria. So I went looking for more detail, and found some of the traditional stuff. The paper (below), the press release, etc. And then I found the Reddit discussion. The Aquaria team did a Science AMA on this tool. It engaged a range of folks–some folks just fans of science who had probably never seen protein structures before. That’s fine with me–the more folks who appreciate research and learn about how researchers explore proteins is a good thing. But others had good technical questions for the team–such as other ways to find proteins of interest with sequence searches, or integration with other tools like UCSC Genome Browser. All the answers are over there. I enjoyed the question about the name of the tool:

It seems you get the ideas we had in mind: using Aquaria lets us observe these fascinating creatures (proteins) from the natural world. Aquaria creates an artificial environment and lighting where we can observe isolated proteins; like aquarium fish, proteins are often beautiful and (usually) live in water.

I asked them about how this played out, and they had ~1000 folks visit their site as a result of this Reddit event. That was really interesting to me, and a very neat route to drive awareness.