Log In Join
 |  Forums  |  Help  |  Settings  |  Download


Research: Human Proteome Folding - Phase 2: Project FAQs
 
Project Overview
Project Details
Project News
Research Participants
Project FAQs
Project Statistics
Human Proteome Folding - Phase 2



Balancing resolution with computational efficiency:
Protein structure prediction procedure must strike a delicate balance between the computational efficiency of the procedure and the level of physical detail used to model protein structure within the procedure. Low-resolution models can be used to predict protein topology/folds and sometimes suggest function (Bonneau et al. 2001b). Low-resolution models have also been remarkably successful at predicting features of the folding process such as folding rates and phi values (Alm and Baker 1999a; Alm and Baker 1999b). It is clear, however, that modeling proteins (and possibly bound water and other cofactors) at atomic detail, and scoring these higher resolution models with physically derived, detailed, potentials is a needed development if higher resolution structure prediction is to be achieved. Recent progress has focused on the use of low-resolution approaches for finding the fold followed by a refinement step where atomic detail is added (side chains added to the backbone) and physical scoring functions are used to select and/or generate higher resolution structures. Several recent studies have illustrated the usefulness of using de novo structure prediction methods as part of a two stage process in which low-resolution methods are used for fragment assembly and the resulting models are refined using a more physical potential and atomic detail (e.g. rotamers) to represent side chains (Bradley et al. 2003; Misura and Baker 2005; Tsai et al. 2003). In the first step Rosetta is used to search the space of possible backbone conformations with all side chains represented as centroids. This process is well described and has well characterized error rates and behavior. High confidence or low scoring models are then refined using potentials that account for atomic detail such as hydrogen bonding, van der Waals forces and electrostatics.
One major challenge that faces methods attempting to refine de novo methods is that the addition of side-chain degrees of freedom combined with the reduced length scale (reduced radius of convergence) of the potentials employed require the sampling of a much larger space of possible conformations. Thus, one has to correctly determine roughly twice the number of bond angles to a higher tolerance if one hopes to succeed.

An illustrative example of the difference in HPF1 and HPF2 is the difference between low-resolution methods and high-resolution methods for the scoring of hydrogen bonds. In HPF1 we used the strand packing score, now, for HPF2, we use the hydrogen bond score, you can see this score on the client window. In the HPF1 procedure backbone hydrogen bonding is scored indirectly by a term designed to pack strands into sheets that simply looks to see that strands are aligned. Hydrogen bonding in helices is not modeled and it is assumed that hydrogen bond are satisfied in helices. See the series of pictures below to see hydrogen bonds in proteins. This low-resolution method first reduces strands to vectors (ignoring helical secondary structure fragments) and then scores strand arrangement (and the correct hydrogen bonding implicit in this arrangement) via functions dependent on the angular and distance relationships between the two vectors. Thus, the scoring function is robust to a rather large amount of error in the coordinates of individual atoms participating in backbone hydrogen bonds (as large numbers of residues are reduced to the angle and distance between the two vectors representing the strands). In the high-resolution, refinement, mode of Rosetta an empirical hydrogen bond terms with angle and distance dependence between individual electro-positive and electro-negative atoms is used (Rohl, 2005). This more detailed hydrogen bond term has a higher fidelity and a more straightforward connection to the calculation of physically realistic energies (meaningful units, physicists won’t make as much fun of us for using this one) but requires more sampling, as small changes in the backbone can cause large fluctuations in computed energy. Here is a small protein with the chain colored from N-terminus/start/blue to C-terminus/red.



Now I'll show just the two strands in this protein that are hydrogen bonded (a few hydrogen bonds) to each other:



Here is the protein if I color by atom type (C = green, N = Blue, O = red, S = yellow, H = white):


Here I've removed the fancy trace of the backbone everywhere but over the two strands:



And lastly I show the Hydrogen bonds as black zig-zags between the Nitrogens on one chain and the oxygens on another.


Here is another small protein that has no strands. Hydrogen bonds and help hold together the alpha helices.



Here it is with the helices drawn (same orientation, and colored by atom type):


And again, here is the protein with a few hydrogen bonds drawn as black zig-zags keeping the helix together:



Another major challenge with high-resolution methods is the difficulty of computing accurate potentials for atomic-detail protein modeling in solvent; with electrostatic and solvation terms being among the most difficult terms to accurately model. Full treatment of the free energy of a protein conformation (with correct treatment of dielectric screening) is not a problem with an efficient solution and the computational cost of full treatment of electrostatic free energy (by solving the Poisson-Boltzmann or linearized Poisson-Boltzmann equations for large numbers of conformations) is high. In spite of these difficulties several studies have shown that refinement of de novo structures with atomic-detail potentials can increase our ability to select and or generate near native structures. These methods can correctly select near native conformations from these ensembles and improve near native structures, but still rely heavily on the initial low-resolution search to produce an ensemble containing good starting structures (HPF2 like methods rely on initial search with HPF1 like methods) (Lee et al. 2001; Misura and Baker 2005; Tsai et al. 2003). Some recent examples of high res predictions are quite encouraging, and an emerging consensus in the field is that higher resolution de novo structure prediction (structure predictions with atomic detail representations of side chains) will begin to work if sampling is dramatically increased (thus the grid!). The solvation score is depicted in one of the three score panels in the HPF2 client.

The pair score in HPF2 is like the pair score in HPF1, but HPF2-pair score takes the position of Rotamers (a way of efficiently representing all side chain atoms) instead of centroid positions (representing the amino acid as a blurred out single point). So think of the HPF2 pair score as a all-atom version of the HPF1 pair score (appropriately re parameterized, of course).

Progress in high-resolution structure prediction will invariably be carried out in parallel with methods including but not limited to: predicting protein-protein interactions, designing proteins and distilling structures from partially assigned experimental data sets. Indeed many of the scoring and search strategies that high-resolution de novo structure refinement methods employ were initially developed in the context of homology modeling and protein design (Kuhlman et al. 2002) (Rohl 2004a). The Rosetta commons is currently developing Rosetta for all these methods and more. The part of Rosetta we use for HPF2 is less than half the code.