How the Human Proteome Folding Project Continues to Contribute to Science

Rich Bonneau, technical lead for the Human Proteome Folding project, recently spoke with us about how the project has contributed to advances in his work, as well as to advances in the understanding of the structure and function of proteins. This update includes a slide presentation and an audio recording of our interview as well as written excerpts.

The Human Proteome Folding project (HPF) was the first study to run on World Community Grid. Dr. Rich Bonneau, who was involved in both phases of the project, recently gave us an update on how the data and code from this project continue to help advance knowledge in the field of systems biology, including his current involvement in healthcare-related projects.

You can view slides along with the full audio of the interview below. You can also learn more about the project's background and read excerpts from the interview, starting below the slides.



Everything that goes on in cells and in the body is controlled by the shape of proteins that do or do not let proteins interlock with other proteins. Proteins can perform positive functions, such as helping maintain healthy cells. In some cases, diseases can prevent proteins from performing their necessary functions to maintain healthy cells.

Knowing the shapes of proteins helps researchers understand how the proteins perform their desired functions. For example, the proteins of a virus or bacteria may have particular shapes that enable them to break through the cell membrane, allowing them to infect cells.

For more information about protein functions, you can review this detailed description from Scitable by Nature Education.

The Human Proteome Folding project (HPF) ran in two phases; the goal of the first phase was to determine protein structure in order to predict their functions, and the goal of the second phase was to increase the resolution or accuracy of the predictions for a select subset of human proteins.

Interview Excerpts

Question 1: Can you give us an overview on how the data or tools that were developed during or as a result of the project have influenced your later work?

[The project] gave us the ability to think about every protein in the proteome. It led us in a lot of unexpected directions. One of them is that we need to have a set of proteins that we think don't have functions. There are many thousands of [known] protein functions, such as enzymes that carry out a certain reaction. But if we don't have negative examples--if we don't have examples of what something isn't--then it's harder to classify what something is. Having this comprehensive resource gave us the tools to use structure to identify reliable cases where proteins that didn't have functions could serve as negative training examples. If you can estimate negative examples, then it puts you in a better position to make finely discriminating classifiers for proteins.

Another thing was that the Human Proteome Folding project inspired us to think about was what we could do if we had high-quality structures for many, many proteins. That led us to a recent project, using the code from the HPF project and also some of the data, where we're trying to interpret human genetic variation. We take genetic mutations that are seen in clinics, or from people who were sequenced as part of other studies, and we make structural models of what these mutations look like.

The idea that we could reliably produce models for 70 percent of the proteins that have clinical mutations would not have been thought possible ten years ago. But now...we recently published a paper in the Journal of Nucleic Acids Research describing a tool called VIPUR (Variant Interpretation and Prediction Using Rosetta), which takes mutations and structures and outputs whether or not these mutations will be damaging.

It turns out that most of the mutations in most of the proteins in the genome have little or no effect, but some [mutations] break critical proteins. We want to find these critical mutations and give them to clinicians and biologists so they can sort the damaging mutations from the non-damaging ones. This is one of the rare cases where, as soon as you solve that basic science problem, it's instantly useful because, with mutations, they're often directly clinically relevant since they are often the direct cause of disease.

Question 2: How have the data from the project been shared with and used by other researchers?

We're one of many, many groups that work on this sort of thing. In fact, we're co-sponsoring a conference soon, which is devoted to making sense of mutations in proteins. There will be many people attending with different approaches, and it will be a free exchange of ideas. Our approach, which uses structure prediction, is one of five or six approaches. Our hope is that we can put all these different approaches together.

A larger-scale approach is that often we know that diseases have some heritability, but we don't know exactly what genes are involved. We think that, by automatically adding structural approaches to existing studies, we can separate mutations that are unfolding the proteins. By sorting out these mutations, we can contribute to decoding in existing large-scale studies. For example, we're collaborating on a project to understand mutations in people with and without autism.

The HPF data have been shared through a few different websites since the beginning of the project. One of the key things about the project was that, before it finished, we were getting feedback that it was useful to other scientists. We had many citations from people using our function predictions. Collectively, I think the set of papers [about the HPF data] has around 40 or 50 citations.

The thing we did with the VIPUR code that was different was that we aimed it at clinical labels of proteins--null, neutral, pathogenic, benign and so on. That work is so brand new that we don't know the impact yet. We've had a lot of people interested in the code, we've helped a lot of people set up the code, and we've also distributed mutation predictions based on our work to others.

Question 3: You have an additional position now. Can you tell us a little bit about that?

When I first started working with IBM [on the HPF project] I was a professor at NYU (New York University). Since then, I've become a group leader at the Simons Foundation Center for Computational Biology. This a new, not-for-profit research institute funded by the Simons Foundation. Luckily for me, it's in New York City, so I was able to keep my affiliation with NYU. So my time is split somewhat evenly between the two places.

This new institution is nicely interdisciplinary. There are people here studying fields such as applied math, computer science, molecular biology, and computational neuroscience. The institute is only a year and a half old, so it's the very exciting beginning phase.

Question 4: What are some areas in your field that might benefit from using World Community Grid's computational power?

[Protein] structure will remain a tricky problem that could use World Community Grid's power. There have been a lot of advancements in physics instrumentation that are giving us better structural biology data. The bottom line is that there's a lot more information out there from new experimental and computational techniques that we could use to design experiments.

New technology in the laboratory drives the need for new computation, which gives you answers which only open up new questions, and drive even more amazing technologies. Right now, in structural biology, I think there are cool computations that would be World Community Grid-relevant in the areas of genomics.

When you put all of these new genomic technologies together, the need for putting [the data] together in a model starts to get a little beyond what a small computer can do. I think the need for World Community Grid is going to be there for a while.

Related Articles