About the Project

X-ray Crystallography

One of the favored methods for protein-structure determination is X-ray crystallography. Through this method, scientists use the high-throughput crystallization pipeline to help annotate unknown parts of the human proteome, which in turn will help to improve their understanding of cancer initiation, progression and treatment.*

There are two main steps involved in X-ray crystallography:

  1. Crystallizing the protein: Although a lot more complex, this is similar to putting sugar into a cup of water and letting it sit for a while. Once the water evaporates, tiny sugar crystals appear.
  2. Sending X-rays through the crystal: Depending on how they diffract, a mathematical model is used to determine and observe the protein's structure.

Crystallizing the protein is not a straightforward procedure. There are many thousands of possible conditions that affect the process (concentration of a protein and solution, temperature, pH, chemical additives, etc.), but scientists must find the appropriate combination of these conditions for a protein to crystallize. For example, with sugar, if you change the water to another liquid, change the temperature or concentrations, you may not get a crystal. Similarly, for a given protein, the challenge is to know what conditions will lead to forming a crystal — what solution, what temperature, pH, etc.

The resultant protein crystal also must be well-formed and large enough in order for x-rays to detect the protein's structure at high resolution. If the conditions are not perfect for crystallizing the protein, the process can result in either a micro-crystal, which is too small for the protein's structure to be determined; a precipitate, which shows some changes, but does not lead to crystallization event directly; or no change may have occurred at all.

Frustrating the situation is that, as yet another barrier to progress, usually the more important the protein is to cancer research, the harder that protein is to crystallize. Many proteins involved in cancer are long chains, or they require additional proteins to properly fold and cannot be crystallized by themselves.

In order to run the millions of combinations necessary to successfully crystallize a protein, scientists have used robots to perform the work. Robots are able to put in place the various crystallization conditions faster and more accurately. To further facilitate the process, result of each of the millions of crystallization experiments are photographed.

Currently, scientists at the Hauptman-Woodward Medical Research Institute (HWI) in Buffalo have run more than 86 million crystallography experiments for more than 9,400 proteins. As a result, they have 86 million pictures of these proteins that have gone through the X-ray crystallography high-throughput screening pipeline. Each of these pictures needs to be analyzed to determine what the result of the experiment is — i.e., crystal, precipitate, phase separation, skin effect, no change.

One of the challenges is the tremendous size of these datasets, which requires over 25 TB of storage (or equivalent to more than 9,000 DVDs). IBM's Blue Gene supercomputer has provided assistance in this phase of the work, by running a special image compression algorithm to reduce the size of these images without losing content. The other challenge is to comprehensively analyze an image to determine the crystallization outcome, a task that requires approximately 10 hours to process on a single computer. Researchers would thus require almost 100,000 years to analyze the existing pictures.

World Community Grid and "Help Conquer Cancer"

Using the power of World Community Grid, scientists at the Ontario Cancer Institute (OCI), Princess Margaret Hospital, and the University Health Network will process the existing 86 million images of proteins that have been screened in the high-throughput crystallization pipeline at HWI. World Community Grid will run a CrystalVision program that the researchers at OCI have developed to analyze the features of individual images to determine the outcome of the crystallization screen — crystal, micro crystal, phase separation, skin, precipitate, or no change.

If a crystal occurs, crystallographers can put the protein through the optimization process to determine the optimal conditions for the crystallization, and in turn perform a diffraction experiment to determine the structure of the protein. What's more, scientists can compare proteins that have successfully crystallized against proteins of unknown structure that have similar characteristics, based on the results from the crystallization screen. This can be the starting point for crystallization for these proteins so that their structure can be determined.

If the crystal produced was not well-formed or large enough, scientists can still use the information to help them better determine the conditions necessary to create a well-formed crystal. For example, they may learn that Protein X and Condition A resulted in a micro crystal, and Protein A and Condition Z resulted in a micro crystal as well. Based on this information, they can then run additional experiments to deduce what conditions need to be optimized to create a larger and more well-formed crystal.

Analyzing the results from this experiment will also lead to better understanding the underlying principles of protein crystallography. For the first time, a comprehensive crystallography image analysis will be done, which was impossible before due to computational complexity. In turn, CrystalVision will be improved to provide faster and more accurate image classification.

Improving the protein crystallography pipeline will enable researchers to determine the structure of many cancer-related proteins faster. This will lead to improving our understanding of the function of these proteins, and enable potential pharmaceutical interventions to treat this deadly disease.

* There are other approaches to understanding the structure and function of proteins, including the method used in the Human Proteome Folding Project also running on World Community Grid. Given the essential nature of this work, it's important to advance every research technique to complete our understanding of the human organism and disease.