Log In Join Now
Forums  |  Help  |  Settings  |  Download
Research: Uncovering Genome Mysteries: Project FAQs
 
Uncovering Genome Mysteries



  Project information
The project will compare about 200 million proteins encoded by the genes from a wide variety of known and unknown organisms. These genes came from organisms in samples taken from a range of environments, including water and soil, as well as on and in plants and animals. DNA from all the organisms in those samples (the metagenome) was extracted and analyzed to identify genes that encode proteins, most of which are enzymes. Uncovering Genome Mysteries will compare the proteins encoded by those genes to one another, both individually and in groups, to find genetic similarities. Such similarities can reveal the functions these organisms perform in various natural processes. Scientists can then use that knowledge to design solutions to solve important environmental, medical and industrial problems.
Because of recent advances in DNA sequencing technology, there is now a huge amount of gene information available for a wide variety of organisms, with more being decoded every day. Many of these organisms, particularly microorganisms, have never been studied in detail before. We therefore know little about what they can do, and how they interact with their environment. However, it is likely that many genes from unknown organisms will be similar to genes from organisms that we know more about. When similarities are found, researchers get a head start in understanding previously unknown organisms.
The researchers will publish an open-access database of the protein sequence comparisons computed on World Community Grid.

We expect that this information will help scientists discover new enzymatic functions, find how organisms interact with each other and the environment, document the current baseline microbial diversity, and better understand and model complex microbial systems.
There are two main areas where this research is expected to have a beneficial effect: current scientific research, and future technologies.

On the research side, the results should help improve scientific knowledge about gene and protein functions and biochemical processes in general, as well as helping scientists understand how microbial communities are changing in response to changing conditions in the natural world.

There are also several exciting ways in which this knowledge may help solve pressing world problems. For example, new knowledge about organisms should help identify, design and produce new antibiotics and drugs against diseases, as well as new enzymes for industrial applications, such as food processing, chemical synthesis, or the production of biodegradable plastics or biofuels. In the long-term this knowledge should help us manage the diverse organisms’ important functions in the world's ecosystem, in all environments, in industrial settings, and in human, animal and plant interactions.

  Scientific background
DNA stands for deoxyribonucleic acid. DNA strands are molecules that act as blueprints for all living things. A single DNA molecule consists of a helical (coil shaped) strand or chain, consisting of four chemical “letters” that make up phrases (“genes”) and the genetic code. These letters are A, C, T and G and stand for the four types of compounds (adenine, cytosine, thymine, and guanine), which are assembled to form the DNA molecule’s gene codes.
Genes are “DNA phrases” that encode for proteins. Specific three-letter DNA sequences each encode one specific amino acid. Chains of amino acids form proteins, some of which contribute to the structure of a cell (such as a microorganism) while others act as enzymes. Learn more.
A Protein is a chain of amino acids that folds in a particular structure necessary for the function of that protein. The chain can be composed of up to 20 different kinds of amino acids, and the types and order of those amino acids are encoded in the gene sequence (the genetic code). The amino acid sequence is also known as the “protein sequence” because there are multiple gene sequences that can specify the same protein sequence. A cell is made of thousands of proteins (in addition to fatty molecules called lipids, sugars and other chemicals) that can have either a structural function or an enzymatic activity. Enzymes are proteins that help break down other molecules or build new ones. Several enzymes can work in concert to convert molecules into other chemical building blocks for the cell (for example, sugar into lipids), or to extract energy from sugar.
Enzymes are proteins that convert chemicals or act as catalysts. Certain enzymes in plants, for example, can assist in the absorption of carbon dioxide molecules and incorporate them into other cellular molecules.
DNA sequencing is a technology to determine the sequence of the four “letters” (A, C, T, G) that encode for genes, by chemically analyzing DNA molecules.
In the first step we convert the DNA sequence into an amino acid sequence. This amino acid sequence then defines the properties of a protein. By comparing the amino acid sequence with other known sequences in databases, we can use the information about previously studied proteins to predict the functions of new proteins being investigated. If we know the function of all the proteins encoded by a genome, then we can ultimately understand how a cell or microorganism works.
A genome consists of all the genetic code for an individual organism, while a metagenome describes all genes and elements encoded in a group or community of organisms, for example, all of the microorganisms within a sample of soil or ocean.
Microorganisms are microscopically small life forms, mostly single celled, and include bacteria, archaea, protozoa, yeasts and microscopic algae. Members of these diverse groups are present in almost all environments on earth: in the air, water, earth, rocks, and even where conditions are very harsh, such as the deep ocean and polar environments. They play a crucial role in maintaining all ecological systems and interact closely with one another and with other life forms. They are present in and around other living systems, such as plants, animals and humans.
Microorganisms represent the great unseen and under appreciated majority of life on our planet. They are everywhere in the environment and in larger, more complex organisms. They are important for a huge variety of natural processes, including human health, agriculture and food production. For almost any kind of organic molecule, there will be a microorganism that has evolved the capacity to decompose, change, or construct it.
Without a proper functioning of microorganisms the health of our planet would quickly deteriorate and higher organisms, including humans, would cease to exist. Despite their importance for our planet’s health, we know little about the diversity and function of microorganisms in the environment. Microorganisms also harbor new and unexpected functions that can be harnessed for biotechnological processes, such as food or drug production.

  Project graphics
The graphics show a portion of a pair of protein sequences, which have been compared on your computer or device in the form of “letter codes”. The letters (A, R, N, D, C, E, Q, G, H, I, L, K, M, F, P, S, T, W, Y, V) represent the twenty types of amino acid molecules that are assembled in a chain to form the protein molecule. The matching letters in the two protein sequences are highlighted.

When sequences match, it means that the unknown organism produces a protein that is similar to a more well-known protein. This can indicate that the two proteins have similar functions, and give scientists a head start in understanding the unknown organism.
This graphically shows approximately how far along your device is in calculating the current task. It tells how many proteins have been compared against a set of other proteins. When your device has completed its work and the marker reaches the right end of the bar, the computation is completed and the results will then sent to World Community Grid before being packaged and sent back to the research teams.
A screenshot of the project graphics is available for download in the following resolutions:

500x373
1000x745
4000x2978