Seven quadrillion comparisons later, Uncovering Genome Mysteries is just getting started

The Uncovering Genome Mysteries research team has started analyzing results from their massive ongoing project, which is comparing proteins between diverse organisms from around the world. Better understanding of similarities between proteomes should help scientists develop sustainable technologies, renewable materials, productive crops, and new treatments for stubborn diseases.

The Uncovering Genome Mysteries (UGM) project started running on World Community Grid on October 16, 2014, with the daunting task of comparing all currently predicted protein sequences encoded in the genomes of a wide variety of living organisms, with special emphasis on microorganisms. The project expects to examine more than 200 million proteins, the majority of which were generated in environmental and ecological studies ranging from bacteria in marine ecosystems in Australia, to Amazon River samples from Brazil. Similarity data from these comparisons will lead to a better understanding of metabolic and structural functions of the predicted proteins in databases, and uncover many new features and cellular processes in microorganisms. Of the expected 20 quadrillion (20,000,000,000,000,000) comparisons in the project, about 36% have been completed thus far, equivalent to almost 8,000 CPU-years of computation.

This project involves cooperation between World Community Grid; the laboratory of Dr. Torsten Thomas and his team in the School of Biotechnology and Biomolecular Sciences & Centre for Marine Bio-Innovation at the University of New South Wales, Sydney, Australia; and the laboratory for Functional Genomics and Bioinformatics of Dr. Wim Degrave and his team at the Oswaldo Cruz Foundation – Fiocruz, in Brazil.

Volunteers participating in the UGM project process work units that contain sets of protein sequences predicted from a variety of organisms, and compare those against each other. Every time a significant similarity between two sequences is detected, a line of output is written that contains the coordinates and information on the statistical significance of the similarity. All of the output data together allow us to trace functional predictions of unknown sequences when they are similar to sequences with known functions, and indicate how organisms and their biochemistry, metabolic functions, and other cellular processes relate to one another.

The data resulting from those calculations are starting to be processed at Fiocruz and the University of New South Wales, and will later be presented in a database that will allow researchers to study the relationships between the proteins of all living things, to help develop a much better understanding of organisms in their (biodiverse) environment. Many applications in health, environment, and agriculture can be attributed to making use of such data. For example, they enabled the development of new strategies to fight pathogens that threaten human and animal health, and development of diagnostics, treatments, and preventions through appropriate design of vaccines. But there are many other applications to be discovered, in agriculture, industry or the environment, through the study of the wide variety of proteins and enzymes. For example, these might function as insecticides, antibiotics or enzymes that can degrade and eliminate waste or industrial pollutants such as oil or organic chemicals. Enzymes can aid in the synthesis and production of “green chemicals” and biotransformation systems, but also in the production of renewable energy such as bio-alcohols, or in more sophisticated systems through synthetic biology, where the engineering of microorganisms can optimize the production of biopharmaceuticals, green plastics and biofuels. A thorough knowledge of biochemical pathways and their regulation is necessary and is being addressed in part through projects like UGM, where the wide variety of enzymatic and biological functions in nature will become more available to the scientific community.

We deeply thank the World Community Grid volunteers who are contributing to this massive effort.

Seven quadrillion comparisons later, Uncovering Genome Mysteries is just getting started

Related Articles