Nutritious Rice for the World

Rice, maize and wheat are the three main cereal grains in the world, accounting for 43% of the world's food calories. The rice genome is the only cereal genome that has been sequenced. While the rice genome is different from the human and other mammalian genomes, it is a good model for the other cereal grains. Lessons learned about how the functions and interactions of rice genes interact are likely to be useful in understanding the genetics and biology of other major crops.
Proteins are large biomolecules consisting of one or more chains of amino acids. The sequence and identity of the amino acids making up the chain determine the structure and the properties of the proteins. Proteins are made by transcribing and translating the DNA sequence of the corresponding gene. So, while DNA may be thought of as the blueprint of life, proteins carry out the instructions contained in the blueprint.
What don't they do? Some proteins are structural such as collagen and keratin which makes up the hair, skin and nails. Some are enzymes that catalyze the chemical reactions necessary for all activities like metabolism. Others have important signaling and feedback functions that ensure that cells do what they are meant to and don't grow out of control.
Proteins are governed by the same rules as any other molecule. The structure of a protein, or how it folds, determines its function. For example, the precise arrangement of active chemical groups from different amino acids in the protein chain at the active site of an enzyme accounts for its catalytic activity. Another example is the location of positively charged groups on the surface to allow DNA binding proteins to bind to the negatively charged phosphate backbone of DNA. In addition, one can often identify the role of a protein of unknown function by comparing its structure to structures of known proteins.
Proteins are too small to be seen by common visible light microscopy. It is possible to see larger proteins and protein arrays using transmission electron microscopy or atomic force microscopy. The protein structures that you usually "see" are depictions based upon high resolution structures as determined by X-Ray crystallography or Nuclear Magnetic Resonance (NMR).
Prediction is the only viable alternative to experimental techniques which can be extremely labor intensive and require many months or years of effort.

Protein structure prediction is an active area of research, and no one method or methodology is "best" for all situations. The public success of projects like Folding@Home, POEM@Home, Human Proteome Folding, and Rosetta@Home are evidence of the interest in solving this computationally challenging problem. We wish to offer another approach that differs in certain subtle but significant ways that can provide complementary and competitive results.

Some approaches (like Folding@Home and POEM@Home) simulate the protein folding process as we believe it occurs in real life, where physical energies are minimized. Protinfo (like Human Proteome Folding and Rosetta@Home) uses a minimization of "statistical energies" to identify likely protein structures, but with a slightly different approach. Rather than relying on a single complex energy function, Protinfo uses a simple, easily evaluated function and chooses the best structures by following up with a set of more sophisticated functions. Another difference is that Protinfo uses a novel continuous sampling methodology that enables us to explore good structures more finely. The continuous sampling methodology incurs little memory overhead and evaluating our compact energy function is very fast. This allows Protinfo to run on almost any computer.

The Protinfo structure predictions have been ranked as some of the best by the Critical Assessment of Structure Prediction (CASP) competition since 1994. You can read more about Protinfo on the researchers' page about this project.

Two factors that make protein structure prediction challenging are the nature of the energy functions, and the vast search space.

The environment of a protein is populated with many other atoms and molecules. If the program were simulating a process that happened in vacuo or even in a non-polar solvent (instead of the aqueous environment of the cytoplasm) it would be much easier. The presence of polar and polarizable solvent molecules make accurate calculation of electrostatic forces extremely difficult. In addition, the main "force" in protein folding is the hydrophobic effect. This arises from the interactions between atoms within the protein, their interactions with the solvent atoms and the interactions between the solvent atoms. In simulations such as Protinfo, Human Proteome Folding, and Rosetta@Home, the effect of these solvent dependent interactions is approximated in the statistical energies. The development of better solvent models and simulations is another active area of research that will eventually address these problems.

The other limiting factor is the number of possible structures, or conformations, that need to be sampled for a protein. Even with a completely accurate energy function, there is still a need to sample the possible conformations finely enough to find the right one. Not only is the number of possible conformations huge (see
Levinthal paradox), it is made even more difficult by the extremely complicated energy landscape. Most of the usual global optimization techniques that could be used with a well behaved function will fail when applied to protein folding. Luckily, of the two problems, this is probably the lesser. With increased CPU power and improved sampling techniques generally some accurate structures are usually generated - but without the completely accurate energy function we are not always able to identify them.