It often seems as though humankind is in a state of conflict with the natural world. Pathogens are evolving resistance to many of today's important antibiotics. We are consuming many of Earth's valuable resources at an unsustainable rate, while pollution in the air and water threatens the health and livelihoods of many communities.
Fortunately, we are beginning to understand that nature may have already developed solutions to many of these problems, and they are hidden in plain sight: in forests, oceans and soils. For example, studies of exotic soil samples and plant extracts have revealed substances with the ability to kill particular kinds of disease-causing bacteria. We have found exotic tropical plants that show promise as efficient, sustainable fuel sources. Microorganisms have been used to clean water in sewage treatment plants and even help consume oil spills. Most of these discoveries were uncovered through time-consuming trial and error. If we could better understand the amazing range of natural powers, we might be able to speed up development of practical technologies and solutions.
One approach to identifying nature's hidden "superpowers" is to analyze the genetic makeup of different organisms to help us understand how they function. Traditionally, this has been a very expensive and time-consuming process, but in recent years scientists have developed more affordable and effective methods to decode DNA. The result is an explosion of genomic data from animals, plants and particularly microorganisms. After DNA has been decoded, scientists must conduct further studies to discover the function of each gene and its corresponding protein. Each gene specifies the sequence of amino acids to assemble into a molecular chain which is then folded into a protein molecule. This is also known as the protein sequence.
Genes and their corresponding proteins play important roles in many life processes, and as a result, are often valuable in medicine and industrial applications. Some proteins are chemical factories, called enzymes, which can break down molecules into simpler components or help construct more complex molecules. Other proteins form the building blocks of all kinds of structures in plants and animals. Still other proteins play roles in controlling all kinds of activity in cells in response to various stimuli.
It is clear that there is a wealth of useful knowledge to be found by understanding what unknown genes and their corresponding proteins do. This knowledge might even help scientists solve many of the world's most pressing problems. However, there are two important challenges to this effort:
First, we are rapidly losing many valuable potential sources of DNA from diverse life forms. This is because many acres of unexplored pristine forests and water habitats are disappearing due to human development, climate change and other factors. We are losing the rich resources in nature that harbor valuable, yet hidden, solutions to the world's problems. We need more efficient and effective ways to discover what nature still has to tell us before it is too late.
And secondly, if we want to search for useful genes in unknown organisms, the scale of the task is staggering. Each organism may have thousands of genes, and there can be tens of thousands of organisms in even a small sample of water or soil. If we want to unlock nature's hidden powers, we need new methods to deal with the "big data" from the hundreds of millions of genes that are being decoded.
The Proposed Solution and Justification
Uncovering Genome Mysteries expects to examine close to 200 million genes from a wide variety of life forms, such as seaweeds from Australian coastlines and microbes found in Amazon river samples. Those genes are being compared against each other to assess their similarity. When two genes are similar, and the function of one gene is already known, this allows scientists to make educated guesses about the function of the other gene. This represents about 20 quadrillion (2x1016) comparisons. The total computation time is projected to take the equivalent of one computer running continuously for 40,000 years--no small feat, but feasible thanks to the computational power of World Community Grid. While DNA sequences from all forms of life will be processed, microorganisms will receive a special focus.
What are microorganisms and why study them?
Microorganisms are microscopically small life forms, mostly single celled, and comprise bacteria, archaea (tiny organisms that look like bacteria, but are fundamentally different), protozoa (like amoeba and several parasites), yeasts and microscopic algae. Members of these diverse groups are present in almost all environments on earth: in the air, water, earth, rocks, and even where conditions are very harsh, such as deserts, undersea sulphuric volcanic vents or in polar ice. They play a crucial role in maintaining all ecological systems and interact closely with one another and with other life forms. They are present in and around other living systems, such as plants, animals and humans.
Recently, scientists realized that there are far more microorganisms in nature, both in number and in variety, that we did not know about because they do not show up in laboratory cultures. New methods of studying the genetic material present in environmental samples is giving us insight into this complex hidden world. Scientists do this by taking a sample of nearly anything, including soil, water, saliva, the surface of a leaf, the gut of a cow, or air. They then extract and prepare the total genetic material present from all the organisms in that sample into a large dataset called the "metagenome." Based on that, they can analyze the genetic codes and sort the fragments of data according to genomes of different organisms. We are learning that there are amazing numbers of these organisms. For example, there are about 10 times more microorganisms living in and upon our bodies than actual human cells of our own. Microorganisms are tiny, but they are numerous. On a global scale they equal or even surpass all plants, not only in terms of diversity, but also in actual biomass. Microorganisms represent the great unseen and underappreciated majority of life on our planet.
Often the interactions with microorganisms are beneficial, sometimes neutral, and sometimes pathogenic. They are important for a huge variety of natural processes, from human health (e.g. our gut bacteria like Escherichia coli, or the probiotic bacterium Lactobacillus help digest our food), to agriculture (e.g. the nitrogen-fixing root bacterium Rhizobium, or the many types of soil bacteria that break down organic matter and perform chemical transformations), and food production (e.g. the baker's yeast Saccharomyces cerevisiae). For almost any kind of organic molecule, there will be a microorganism that has evolved the capacity to decompose, change, or construct it. This is because life itself is based on such chemical reactions and has evolved many ways to produce and manipulate the compounds that are useful to cells. In a way, microorganisms are ideal chemical factories. They are often simple in the sense that they may evolve to focus on certain particular chemical reactions. For example, they might evolve bioremediation and biodegradation methods - ways to degrade certain toxins or waste into compounds that can be digested as food or at least are non-toxic. They may evolve the means of producing chemicals that fight off other forms of life harmful to their existence, which may become useful to scientists developing antibiotics or other medicines. Microorganisms are also key players in global biochemical cycles such as the fixation of nitrogen from the atmosphere (making fertilizer) or the removal and sequestration of CO2 from the atmosphere. For example, it has only been recently recognized that microbial processes are responsible for the majority of CO2 absorption by the world's oceans. These and other microbial functions are clearly essential to support all higher life forms, such as plants, animals and humans, and to keep our global ecosystem in balance. Without a proper functioning of microorganisms the health of our planet would quickly deteriorate and higher organisms, including humans, would cease to exist.
Despite their importance for our planet's health, we know little about the diversity and function of microorganisms in the environment. This is partially due to their microscopic size and the difficulty of isolating and studying microorganisms in the laboratory, because the majority of these organisms do not grow in the laboratory. However, technical breakthroughs in the last decade have allowed us to study microorganisms at an unprecedented level of detail by determining and deciphering the DNA sequences of their genetic code (their genomes). By reading the genetic code we can better predict and understand microbial function. Our understanding of how soil bacteria transform cellulose, how pathogenic organisms cause disease, how yeasts produce different metabolites depending on their growth medium and conditions, or how bacterial colonies act together and form biofilms on different surfaces is now incomparably more detailed than before the genomics era.
Modern DNA sequencing technologies can now rapidly determine millions of DNA sequences at reasonable costs. New technological breakthroughs are being developed to augment this capacity by several orders of magnitude. This will allow scientists to determine all DNA sequences hidden in the unseen microbial world. They have already been doing this for many medical and industrially important unicellular and multicellular organisms, animals, plants and human individuals over the last few years. Since the 1990s, genome analyses have concentrated on studying three kinds of organisms: "model organisms" in biology, because they had been studied for decades in laboratories (such as E. coli, yeast, helminthes and mice); important human, animal and plant pathogens (like the bacteria that cause tuberculosis and leprosy, or crop pathogens); and finally, representative organisms in the "Tree of Life". In more recent decades, many scientists around the world have started sequencing and analyzing metagenome data from many additional biomes, enriching our knowledge about biodiversity in air, land and sea, from the arctic to tropical forests. From this work, a very complex picture of the diversity of living organisms on our planet is emerging.
The daunting task of interpreting this now huge and exponentially growing amount of DNA sequence data ("big data") is not trivial. DNA sequence information is only meaningful and useful if it can be decoded and interpreted by comparing it to other gene sequences of known or unknown function, a process called "genome annotation," while mapping variations. This decoding and annotation process requires vast amounts of computational power, and is currently a major bottleneck in making sense of genomes that have already been sequenced.
The Uncovering Genome Mysteries project aims to harness the computational power of World Community Grid to give biological meaning to gene-sequencing data available for microorganisms and other life forms. This will be done on the level of comparison between individual microbial genomes as well as on the level of the genetic information of entire microbial communities for the environment (metagenomes). Decoding genomes and metagenomes will provide new information on the functional role and diversity that microorganisms play in the environment. Comparison of this information with known functional data from other organisms already studied in greater detail will be crucial for the interpretation and annotation of the DNA codes.
The specific goals of the Uncovering Genome Mysteries project are:
To create a database of protein sequence comparison information, based on the DNA found from diverse sources, for all scientists to reference.
To discover new gene functions, augmenting our knowledge about biochemical processes in general.
To find how organisms interact with each other and environment.
To document the current baseline microbial diversity, allowing us to understand how microorganisms change under environmental stresses, such as climate change.
To better understand and model complex microbial systems.
While the immediate computational results of this project are only an early step in achieving the above goals, they will be ultimately useful in many ways. For example, the resulting knowledge should help identify, design and produce new antibiotics and drugs against chronic diseases, as well as new enzymes for industrial applications, such as food processing, chemical synthesis or the production of green plastics or biofuels. In the long-term this knowledge should help us manage the diverse organisms' important functions in the world's ecosystem, in all environments, in industrial settings, and in human, animal and plant interactions.