The Help Conquer Cancer project researchers have developed an image-analysis and classification system for automatically scoring images from high-throughput protein crystallization trials.
"Protein crystallization analysis on the World Community Grid"
Non-technical abstract:
The structure of cancer related proteins is important to know because their shape determines their function and role in the disease process. These proteins are usually large, so the only way to determine their structure is using x-ray crystallography. What makes this exceedingly time consuming is the necessary first step of getting the protein to crystallize. To do this, the scientists mix in varieties of compounds in the hope that this will help spur the crystallization of the protein. Using robots, many thousands of crystallization attempts are made. To determine if a crystal actually formed requires human observation and this is very time consuming.
By using World Community Grid, the scientists were able to developed an automated system for analyzing the images of the crystallization attempts and recognizing whether crystallization occurred. They have already trained the system to successfully recognize 80% of crystal-bearing images and eliminate 98% of clear drops. This significantly reduces the time required for human inspection, which should lead to much faster structure determination of proteins under study.
Eventually this should lead to a better understanding of the role of certain proteins in cancers and other diseases, which in turn should lead to identifying better treatments for these diseases.
Technical Abstract:
We have developed an image-analysis and classification system for automatically scoring images from high-throughput protein crystallization trials. Image analysis for this system is performed by the Help Conquer Cancer (HCC) project on the World Community Grid. HCC calculates 12,375 distinct image features on microbatch-under-oil images from the Hauptman-Woodward Medical Research Institute’s High-Throughput Screening Laboratory. Using HCC-computed image features and a massive training set of 165,351 hand-scored images, we have trained multiple Random Forest classifiers that accurately recognize multiple crystallization outcomes, including crystals, clear drops, precipitate, and others. The system successfully recognizes 80% of crystal-bearing images, 89% of precipitate images, and 98% of clear drops.
Journal that the paper appeared in: Journal of Structural and Functional Genomics.
For access to the paper, please click here.