Human Proteome Folding - Phase 2



HPF1 vs. HPF2: Scoring different structures at higher resolutions

Balancing resolution with computational efficiency:
Protein structure prediction procedure must strike a delicate balance between the computational efficiency of the procedure and the level of physical detail used to model protein structure within the procedure. Low-resolution models can be used to predict protein topology/folds and sometimes suggest function (Bonneau et al. 2001b). Low-resolution models have also been remarkably successful at predicting features of the folding process such as folding rates and phi values (Alm and Baker 1999a; Alm and Baker 1999b). It is clear, however, that modeling proteins (and possibly bound water and other cofactors) at atomic detail, and scoring these higher resolution models with physically derived, detailed, potentials is a needed development if higher resolution structure prediction is to be achieved. Recent progress has focused on the use of low-resolution approaches for finding the fold followed by a refinement step where atomic detail is added (side chains added to the backbone) and physical scoring functions are used to select and/or generate higher resolution structures. Several recent studies have illustrated the usefulness of using de novo structure prediction methods as part of a two stage process in which low-resolution methods are used for fragment assembly and the resulting models are refined using a more physical potential and atomic detail (e.g. rotamers) to represent side chains (Bradley et al. 2003; Misura and Baker 2005; Tsai et al. 2003). In the first step Rosetta is used to search the space of possible backbone conformations with all side chains represented as centroids. This process is well described and has well characterized error rates and behavior. High confidence or low scoring models are then refined using potentials that account for atomic detail such as hydrogen bonding, van der Waals forces and electrostatics.
One major challenge that faces methods attempting to refine de novo methods is that the addition of side-chain degrees of freedom combined with the reduced length scale (reduced radius of convergence) of the potentials employed require the sampling of a much larger space of possible conformations. Thus, one has to correctly determine roughly twice the number of bond angles to a higher tolerance if one hopes to succeed.



HPF1 vs. HPF2: Hydrogen Bonds

An illustrative example of the difference in HPF1 and HPF2 is the difference between low-resolution methods and high-resolution methods for the scoring of hydrogen bonds. In HPF1 we used the strand packing score, now, for HPF2, we use the hydrogen bond score. In the HPF1 procedure backbone hydrogen bonding is scored indirectly by a term designed to pack strands into sheets that simply looks to see that strands are aligned. Hydrogen bonding in helices is not modeled and it is assumed that hydrogen bond are satisfied in helices. This low-resolution method first reduces strands to vectors (ignoring helical secondary structure fragments) and then scores strand arrangement (and the correct hydrogen bonding implicit in this arrangement) via functions dependent on the angular and distance relationships between the two vectors. Thus, the scoring function is robust to a rather large amount of error in the coordinates of individual atoms participating in backbone hydrogen bonds (as large numbers of residues are reduced to the angle and distance between the two vectors representing the strands). In the high-resolution, refinement, mode of Rosetta an empirical hydrogen bond terms with angle and distance dependence between individual electro-positive and electro-negative atoms is used (Rohl, 2005). This more detailed hydrogen bond term has a higher fidelity and a more straightforward connection to the calculation of physically realistic energies (meaningful units, physicists won't make as much fun of us for using this one) but requires more sampling, as small changes in the backbone can cause large fluctuations in computed energy. Here is a small protein with the chain colored from N-terminus/start/blue to C-terminus/red.



HPF1 vs. HPF2: Solvation - modeling the protein in water at higher resolution

Another major challenge with high-resolution methods is the difficulty of computing accurate potentials for atomic-detail protein modeling in solvent; with electrostatic and solvation terms being among the most difficult terms to accurately model. Full treatment of the free energy of a protein conformation (with correct treatment of dielectric screening) is not a problem with an efficient solution and the computational cost of full treatment of electrostatic free energy (by solving the Poisson-Boltzmann or linearized Poisson-Boltzmann equations for large numbers of conformations) is high. In spite of these difficulties several studies have shown that refinement of de novo structures with atomic-detail potentials can increase our ability to select and or generate near native structures. These methods can correctly select near native conformations from these ensembles and improve near native structures, but still rely heavily on the initial low-resolution search to produce an ensemble containing good starting structures (HPF2 like methods rely on initial search with HPF1 like methods) (Lee et al. 2001; Misura and Baker 2005; Tsai et al. 2003). Some recent examples of high res predictions are quite encouraging, and an emerging consensus in the field is that higher resolution de novo structure prediction (structure predictions with atomic detail representations of side chains) will begin to work if sampling is dramatically increased (thus the grid!). The solvation score is depicted in one of the three score panels in the HPF2 client.



HPF1 vs. HPF2: Res-res pair score

The pair score in HPF2 is like the pair score in HPF1, but HPF2-pair score takes the position of Rotamers (a way of efficiently representing all side chain atoms) instead of centroid positions (representing the amino acid as a blurred out single point). So think of the HPF2 pair score as a all-atom version of the HPF1 pair score (appropriately re parameterized, of course).



Higher resolution is important for other methods as well

Progress in high-resolution structure prediction will invariably be carried out in parallel with methods including but not limited to: predicting protein-protein interactions, designing proteins and distilling structures from partially assigned experimental data sets. Indeed many of the scoring and search strategies that high-resolution de novo structure refinement methods employ were initially developed in the context of homology modeling and protein design (Kuhlman et al. 2002) (Rohl 2004a). The Rosetta commons is currently developing Rosetta for all these methods and more. The part of Rosetta we use for HPF2 is less than half the code.