Active Research The Clean Energy Project, In December 2008, members of the Aspuru-Guzik group from Harvard University's Chemistry and Chemical Biology Department launched the Clean Energy Project (CEP) on World Community Grid. This project aims to accelerate breakthrough discoveries in solar-cell technologies. Current solar cells are predominantly based on crystalline silicon. However, other semiconducting materials are under active investigation in order to further reduce the cost of the producing electricity from the sunlight. Among these "new" materials, organic semiconductors have emerged as a promising technology for the deployment of a new generation of low-cost, thin, and flexible solar-powered products. These materials are termed organic because they are made mostly of carbon, and are typically polymers - thousands copies of the same molecule linked end-to-end. This technology has the potential to someday enable fabrication of solar cells in high throughput roll-to-roll coating machines similar to those used to make newspapers. A first step towards such an ambitious development requires the integration of massive amounts of computational power to perform rational searches of the ideal chemical compound(s) with the adequate physical and chemical properties needed to generate clean and sustainable forms of energy sources.
Up to now, the CEP has been able to deploy the computational resources of World Community Grid to attack the design of molecular of materials for solar cell applications from a completely new angle. Instead of molecular design by intuition, we have used powerful theoretical techniques (i.e., molecular dynamics) to simulate the positions and orientations of molecules at certain time and temperature intervals. So far, this approach has enabled us to use 3 million hours of computer time, which is equivalent to 15,000 molecular configurations evaluated each one at 20 different temperatures. This has led us to generate on the order of one terabyte worth of data. As a side note, it would have taken us approximately 5 months to evaluate only 1% of all of 15,000 molecular configurations in an 8-node computational cluster. In principle, we should be able to use this entire amount of data to compute more detailed molecular/quantum-level properties of the molecular systems investigated during CEP's phase 1. We are now working to analyze the results of these hundreds of thousands of runs. The second round of calculations (or CEP's phase 2) will be carried out using the Q-Chem quantum chemistry program, scheduled to be fully integrated into the CEP by about July of 2009. We expect that this extensive computational search will rationally filter out those outstanding molecular candidates for solar cell applications. Finally, that information will be handed out to our collaborators for further device fabrication and integration into real organic solar cell devices to assess their capacity to transform sunlight into electricity.
Nutritious Rice for the World, Computational Biology Research Group, University of Washington, Seattle, Washington, USA We have begun to analyze the protein models generated by Nutritious Rice for the World volunteers. The next step is to use sophisticated methods to select the top protein models for each gene. This will let us focus on a more manageable number of protein structures from the billions generated so far. Rice proteins are very different from what has been previously studied and only 1% of the proteins we're working on have segments which are significantly similar to proteins of known structure. That is why computer modeling is necessary and why this project is important. It also means that we have a lot of hard work ahead of us still!
In general, when proteins have similar amino acid sequences, they also have similar structures. The small number of cases where at least part of the protein sequence is similar to one where the structure is known is thus very useful. We have a good idea what those regions of the protein structure model should look like and this allows us to optimize and validate the tools that we use to pick the best models. That is what we are currently doing. Once we finish this, we will start processing the data and publish the best structures for each gene online.
Help Conquer Cancer, Ontario Cancer Institute, Princess Margaret Hospital, University Health Network, Toronto, Canada This month marks a milestone in Help Conquer Cancer: 25% of our work units have been processed on the Grid, representing 3 million crystallization trials on over 2,000 proteins. Analysis of these results is proceeding on multiple fronts. Our latest crystal-finding classifier, generated from HCC results, can identify 4 out of 5 images containing protein crystals, greatly reducing the effort of manually searching through images for crystals, and thereby increasing the rate of protein structure solution. Using this classifier on a set of 5.7 million images, we recently identified 11 proteins with favorable crystallization conditions, all of which are homologous to human proteins on our cancer target list (spanning lung, ovarian, prostate and head & neck cancers). Finding crystallization conditions clears one hurdle on the path to determining the structure, and ultimately the function, of these cancer targets.
Discovering Dengue Drugs – Together, University of Texas Medical Branch, Galveston, Texas, USA and the University of Chicago, Chicago, Illinois, USA We are back on track with our Discovering Dengue Drugs - Together project, and life is returning to some semblance of normalcy in Galveston and at UTMB. We thank World Community Grid members for their support and patience during the past 6 months following Hurricane Ike.
Phase 1 of our project is now complete! We have screened (i.e. computationally tested) more than 3 million potential drug candidates against each of 10 different proteases from dengue, West Nile, and hepatitis C viruses. Compounds that inhibit these target proteases will prevent virus replication. We continue to analyze the 30 million Phase 1 results, although several compounds tested in the laboratory have already shown excellent activity in our biochemical and cell-based assays. These compounds will be further characterized and tested for in vivo antiviral activity.
Phase 2 of our project is being optimized at Texas Advanced Computer Center (TACC; www.tacc.utexas.edu) in Austin, Texas and ported by IBM to the grid for testing. This phase will perform extensive molecular dynamics calculations to accurately calculate binding free energies for the best potential drug candidates predicted in Aim 1. These calculations will significantly reduce the number of false positive predictions produced in Phase 1, thereby increasing our success rate for identifying experimentally active compounds from ~10% to >80%. Launch of Phase 2 is anticipated for early summer, 2009.
As we prepare to launch Phase 2 of our quest to discover drugs for dengue, west Nile, and hepatitis C infections, we will continue Phase 1 calculations, this time in support of a collaborative drug discovery effort to treat the disease leishmaniasis. This work is done in partnership with researchers from Universidad de Antioquia, Colombia (http://www.pecet-colombia.org). Leishmania affects about 12 million people throughout the tropics, subtropics, and southern Europe, with about 2 million new cases each year. The disease is spread by the bite of sand flies infected the Leishmania parasite. Although a handful of antimicrobials exist to treat some forms of leishmaniasis, concerns about their modes of delivery, effectiveness, resistance, and cost spur our drug discovery efforts into novel anti-Leishmania drugs. For this project, scientists in Colombia identified a set of enzymes critical for survival of the Leishmania parasite. Atomic structures exist for each of these enzymes, this allowing us to computationally examine our drug candidate library for compounds that prevent the Leishmania enzymes from functioning.
As reported earlier, this project is now running better than ever before. We have moved our main computational tasks (those required to prepare work units for World Community Grid and analyze results returned from the grid) to the supercomputers housed at TACC. This establishes a nice synergy between one of the world's most powerful supercomputers and the world's largest computer grid. Our storage capabilities have increased in size and robustness with redundant storage occurring at TACC and locally on our 12 TByte IBM DS3200 disk system.
As stated before, we greatly appreciate the computer time that has been unselfishly provided by the members of World Community Grid!
Human Proteome Folding – Phase 2, Bonneau Laboratory, New York University, New York, New York, USA The Human Proteome Folding project is an ongoing effort to automatically annotate the genomes of organisms that have importance to the human race with predictions about protein structure and function. Over the course of our project World Community Grid has helped to produce protein structural annotations for the human genome as well as the genomes of over 80 disease causing bacteria and viruses, model organisms and plants studied heavily by biologists, and other proteins and organisms with an interesting story to tell. One such example is a collection of entirely unique proteins from J. Craig Venter's Global Ocean Sampling Expedition that developed a lot of interest with evolutionary biologists we're in collaboration with (we're examining these predictions and hope to publish our results soon). We're also using World Community Grid structural predictions to make genome-wide predictions about proteins' molecular function, which gives researchers important clues about what tasks each gene might be performing. While our resource for these results is open to researchers and the public, we've spent the past few months developing an intuitive, visual interface that we plan to submit for publishing and release by the end of this summer. We appreciate the great contribution World Community Grid and its crunchers are making toward basic science, and we hope to make the community proud with the positive results and massive scale or our undertaking by publishing a myriad of results this year.
FightAIDS@Home – Phase 2, Olson Laboratory, The Scripps Research Institute, La Jolla, California, USA The FightAIDS@Home Project uses the volunteered computer power of World Community Grid to test candidate drug compounds against the variations of HIV that can arise because of drug resistance. FightAIDS@Home has identified new HIV protease active site inhibitors that have been shown to work in the test tube and are now being further developed by our chemist collaborators; in addition, several compounds were recently discovered to be potential candidates for a novel binding site on one of the most multi-drug-resistant mutant "super bugs."
Completed Research Help Defeat Cancer Based upon the experimental results gathered during the course of the "Help Defeat Cancer" project our team has been awarded competitive extramural funding from the National Institutes of Health to build a deployable, grid-enabled clinical decision support system to enable researchers and physicians to automatically analyze and classify imaged cancer specimens with improved accuracy. Working closely with Dr. Joel Saltz and his team at the Center for Comprehensive Informatics, Emory University School of Medicine, we have developed and grid-enabled the first generation of imaging and pattern recognition algorithms and we are evaluating their capacity to characterize biomarker expression in a variety of different tumors. The software enables investigators to compare the expression profiles that are generated for a given specimen with the signatures of patients with known outcomes. The long-term goal of this research is to utilize these tools to predict the most probable path of disease progression and response to treatment. While the original experiments focused on cancers of the breast, head and neck, we have expanded the scope of our studies to include prostate cancer, primary melanoma and disease which is metastatic to the sentinel lymph node.
AfricanClimate@Home Global Climate Change is one of the main concerns of world science community and it is no different reality for University of Cape Town's Climate System Analysis Group (CSAG). Since Africa is a region particularly vulnerable to this problem, CSAG scientists pay special attention to Climate Change issues in the continent. Regional Climate Models (RCM's) are some of the tools used by the scientists to understand the interactions amongst Climate components (Atmosphere, Ocean, Land, Cryosphere, Biosphere, etc).
CSAG and World Community Grid teamed up and launched "African Climate @ Home" project in 2007. The idea behind this project is to use the grid's large computational power to perform the large amount of RCM simulations needed to fully span the combinations of parameters found in a climate model. Within a RCM, there are many parameters that must be tuned to perform a correct simulation of a particular region climate. In the case of Land-Biosphere-Atmosphere interactions, some of the parameters like Leaf Area Index (LAI), minimum stomatal resistance (Rs) and some soil moisture parameters.
The first phase of AfricanClimate@Home is finished. A large number of RCM simulations were performed using different sets of the parameters above. By perturbing these parameters we could analyze the sensitivity of the RCM models to some of the land-surface properties. The complete analysis of these simulations is still in progress, but some preliminary results show that the sensitivity of the RCM is stronger on the moisture exchange between canopy and atmosphere. The diurnal cycle of latent heat flux and water vapor mixing ratio are impacted by the variation of both LAI and RS. Future analysis will try to address the sensitivity of the model in the seasonal cycle as well.
Genome Comparison Reliable information for scientists
Using IBM's World Community Grid technology that integrates more than 1 million computers in a grid and reduces 3748 years of analysis to less than 12 months, Fiocruz created a database of protein sequence comparisons from nearly 4 thousand living beings.
The planning and development of new vaccines, drugs and diagnostic kits increasingly depend on detailed and reliable information about proteins and their functions, both of micro-organisms causing diseases and of the human host. The project "GenomeComparison" came to meet this need and its first product, a new database for protein sequence comparison data, which is now available to scientists around the world. The Proteinworld Database, accessible at www.proteinworlddb.org is a reliable source of such information, and it is the result of detailed analysis of millions of sequences of thousands of organisms. The results were compared with more than 3.5 million protein sequences of about 3,800 organisms, from the simplest bacteria to humans, including viruses. This comparison allows scientists to correct errors and fill gaps in the functional annotations in international databases that contain information on the DNA and proteins of living beings, yielding more refined data and assist in research projects that aim to promote health in the world. The algorithm used, called SSearch, is an implementation of Smith-Waterman and results in accurate, complete and consistent numerical data on the two by two similarities of all the protein sequences analyzed. The resulting enormous matrix contains reliable data that are used to map evolutionary distances between organisms or groups of proteins, similarities and differences between metabolic pathways (very useful in developing new drugs against pathogens) or reveals features and characteristics of enzymes, antigens and structural proteins of living beings.
This analysis, if performed by a single computer, would take an eternity or, more precisely, 3748 years. It was, however, completed in less than 12 months, thanks to World Community Grid through the Genome Comparison Project of Fiocruz - the first Brazilian project approved to participate in World Community Grid, which generated over 800 gigabytes of results.
Comparison of Genomes
Most of the protein sequences included by the project were predicted from computational analysis of genomes, which contain the genetic information for the synthesis of the cellular proteins. The more scientists study and decipher genomes, more protein sequences are predicted. All this information is recorded and stored in a large international database of reference sequences, called RefSeq, maintained by the National Center for Biotechnology Information (NCBI) of the United States.
The RefSeq sequences served as the primary source of the protein sequences included in the project. The two-by-two comparisons all-against-all in a stringent manner - a task that was not feasible without grid technology - has generated a huge amount of output from the calculation of an index indicating the degree of similarity between each pair of protein sequences. With the proteinworld database, we can now easily compare complete genomes and investigate the evolutionary origin of organisms. Or, for example, analyze whether a given protein is present in microorganisms that cause diseases, but is absent in humans. This protein, therefore, would be a possible target for developing drugs with few side effects.
About the Proteinworld Database
Using the Proteinworld database, a scientist can verify what other organisms have similar sequences to a protein of interest, and which functions have been attributed to such a protein in other living beings. But this kind of search - comparing a given sequence against all others in the databank - is only one of the possible ways of using the information. Scientist can also do searches by families of proteins, compare all protein sequences of two complete genomes, download full details for an organism and see all the similarities it has with other living beings or, to the contrary, identify all its unique protein sequences.
The idea is that, in future stages of the project, users can have at their disposal more and more data, reaching even greater reliability of the information and options for more complex searches. The Proteinworld Database is the result of a partnership involving the Fiocruz, IBM and the PUC-Rio, where the latter contributes to the modeling of the database, and interfaces for access.
For the Oswaldo Cruz Foundation, participate the Laboratory of Functional Genomics and Bioinformatics, Institute of the Oswaldo Cruz - Fiocruz (Wim Degrave, Antonio B. de Miranda, Marcos Catanho, Thomas Otto, Ana Carolina Guimarães). At the Pontifical Catholic University of Rio de Janeiro (PUC-Rio), participate the Laboratory of Bioinformatics, Department of Informatics (Sergio Lifschitz, Cristian Tristan and Márcia Bezerra).