Human Proteome Folding - Phase 2

Protein structure prediction procedure must strike a delicate balance between the computational efficiency of the procedure and the level of physical detail used to model protein structure within the procedure. Low-resolution models can be used to predict protein topology/folds and sometimes suggest function (Bonneau et al. 2001b). Low-resolution models have also been remarkably successful at predicting features of the folding process such as folding rates and phi values (Alm and Baker 1999a; Alm and Baker 1999b). It is clear, however, that modeling proteins (and possibly bound water and other cofactors) at atomic detail, and scoring these higher resolution models with physically derived, detailed, potentials is a needed development if higher resolution structure prediction is to be achieved. Recent progress has focused on the use of low-resolution approaches for finding the fold followed by a refinement step where atomic detail is added (side chains added to the backbone) and physical scoring functions are used to select and/or generate higher resolution structures. Several recent studies have illustrated the usefulness of using de novo structure prediction methods as part of a two stage process in which low-resolution methods are used for fragment assembly and the resulting models are refined using a more physical potential and atomic detail (e.g. rotamers) to represent side chains (Bradley et al. 2003; Misura and Baker 2005; Tsai et al. 2003). In the first step Rosetta is used to search the space of possible backbone conformations with all side chains represented as centroids. This process is well described and has well characterized error rates and behavior. High confidence or low scoring models are then refined using potentials that account for atomic detail such as hydrogen bonding, van der Waals forces and electrostatics.
One major challenge that faces methods attempting to refine de novo methods is that the addition of side-chain degrees of freedom combined with the reduced length scale (reduced radius of convergence) of the potentials employed require the sampling of a much larger space of possible conformations. Thus, one has to correctly determine roughly twice the number of bond angles to a higher tolerance if one hopes to succeed.
An illustrative example of the difference in HPF1 and HPF2 is the difference between low-resolution methods and high-resolution methods for the scoring of hydrogen bonds. In HPF1 we used the strand packing score, now, for HPF2, we use the hydrogen bond score. In the HPF1 procedure backbone hydrogen bonding is scored indirectly by a term designed to pack strands into sheets that simply looks to see that strands are aligned. Hydrogen bonding in helices is not modeled and it is assumed that hydrogen bond are satisfied in helices. This low-resolution method first reduces strands to vectors (ignoring helical secondary structure fragments) and then scores strand arrangement (and the correct hydrogen bonding implicit in this arrangement) via functions dependent on the angular and distance relationships between the two vectors. Thus, the scoring function is robust to a rather large amount of error in the coordinates of individual atoms participating in backbone hydrogen bonds (as large numbers of residues are reduced to the angle and distance between the two vectors representing the strands). In the high-resolution, refinement, mode of Rosetta an empirical hydrogen bond terms with angle and distance dependence between individual electro-positive and electro-negative atoms is used (Rohl, 2005). This more detailed hydrogen bond term has a higher fidelity and a more straightforward connection to the calculation of physically realistic energies (meaningful units, physicists won't make as much fun of us for using this one) but requires more sampling, as small changes in the backbone can cause large fluctuations in computed energy. Here is a small protein with the chain colored from N-terminus/start/blue to C-terminus/red.