What is a protein?
What do proteins do?
What does protein structure tell us?
What do proteins look like?
Why do we need to predict protein structure?
How is Protinfo different from other approaches?
Protein structure prediction is an active area of research, and no one method or methodology is "best" for all situations. The public success of projects like Folding@Home, POEM@Home, Human Proteome Folding, and Rosetta@Home are evidence of the interest in solving this computationally challenging problem. We wish to offer another approach that differs in certain subtle but significant ways that can provide complementary and competitive results.
Some approaches (like Folding@Home and POEM@Home) simulate the protein folding process as we believe it occurs in real life, where physical energies are minimized. Protinfo (like Human Proteome Folding and Rosetta@Home) uses a minimization of "statistical energies" to identify likely protein structures, but with a slightly different approach. Rather than relying on a single complex energy function, Protinfo uses a simple, easily evaluated function and chooses the best structures by following up with a set of more sophisticated functions. Another difference is that Protinfo uses a novel continuous sampling methodology that enables us to explore good structures more finely. The continuous sampling methodology incurs little memory overhead and evaluating our compact energy function is very fast. This allows Protinfo to run on almost any computer.
The Protinfo structure predictions have been ranked as some of the best by the Critical Assessment of Structure Prediction (CASP) competition since 1994. You can read more about Protinfo on the researchers' page about this project.
Why is protein structure prediction so difficult?
Two factors that make protein structure prediction challenging are the nature of the energy functions, and the vast search space.
The environment of a protein is populated with many other atoms and molecules. If the program were simulating a process that happened in vacuo or even in a non-polar solvent (instead of the aqueous environment of the cytoplasm) it would be much easier. The presence of polar and polarizable solvent molecules make accurate calculation of electrostatic forces extremely difficult. In addition, the main "force" in protein folding is the hydrophobic effect. This arises from the interactions between atoms within the protein, their interactions with the solvent atoms and the interactions between the solvent atoms. In simulations such as Protinfo, Human Proteome Folding, and Rosetta@Home, the effect of these solvent dependent interactions is approximated in the statistical energies. The development of better solvent models and simulations is another active area of research that will eventually address these problems.
The other limiting factor is the number of possible structures, or conformations, that need to be sampled for a protein. Even with a completely accurate energy function, there is still a need to sample the possible conformations finely enough to find the right one. Not only is the number of possible conformations huge (see Levinthal paradox), it is made even more difficult by the extremely complicated energy landscape. Most of the usual global optimization techniques that could be used with a well behaved function will fail when applied to protein folding. Luckily, of the two problems, this is probably the lesser. With increased CPU power and improved sampling techniques generally some accurate structures are usually generated - but without the completely accurate energy function we are not always able to identify them.