|By: The Mapping Cancer Markers research team|
|10 Jul 2014|
The lead researcher for Mapping Cancer Markers presents a roadmap for the project to analyze signatures for 4 types of cancer: lung, ovarian, prostate and sarcoma; an update on his team’s progress thus far, and an invitation to join the research team in an August cancer fundraiser.
On behalf of the Mapping Cancer Markers team, we want to start by saying thank you! In just 7 months, World Community Grid members have donated over 60,000 years of processing time to support our research. As a result, we are nearly done with the “benchmarking” portion of the project, which determines the characteristics of our search space. Over the coming months and years, we will pursue more targeted approaches to discover relevant gene signatures. Today we want to give you both a high-level roadmap and some further detail about what is happening with the project.
The project is anticipated to run for two years, and we plan to analyze signatures for 4 different types of cancer. At the moment, we're enlisting your help to process research tasks for lung cancer, and will move on to ovarian cancer, prostate cancer and sarcoma.
Currently, the Mapping Cancer Markers project has two phases:
- In the first phase we have been attempting to set a benchmark for further experiments.
- The second phase will be geared towards finding clinically useful molecular signatures, initially focusing on gene signatures that can predict the occurrence of various types of cancer.
You can think of this benchmarking phase as a bit like designing an IQ test. By establishing a standard test and scoring system, we can evaluate any person's intelligence. The results from the first phase of Mapping Cancer Markers will allow us to create such a test for existing and future gene signatures, so that we can tell which ones have the best predictive ability.
Our preliminary analysis of the work units processed so far (roughly 26 billion gene signatures) is focused on the nature of genes in the signatures, measuring their quality by assessing how accurately they contribute to identifying patients with poor prognosis. On the analytics side, we have also been evaluating the use of a software package to aid with post-processing our results.
One of the goals of the first project phase is to understand if some genes might have better predictive ability than others. To do this, we took the top 0.1% of the gene signatures and identified the individual genes that make up each signature. For each gene, we looked at how many times it occurred within top scoring signatures and plotted the scores of those signatures (see figure below). The blue line shows the average of all of the genes together. The red line highlights the worst-performing single gene while the green line indicates our best-performing gene. The average of all the genes is very similar to the worst single gene. This is not surprising, because most genes are likely to have poor predictive ability. However, we are looking for the few genes that stand out from the field. In other words, if we have 1 million potential gene signatures, and we look at the top 1,000 scoring signatures, we can find groups of genes such as the one shown in green, which have better predictive ability.
This information is important because if we know which genes have the best predictive ability, it may help us and other researchers to evaluate the value of other signatures: if an unknown signature has one of the top genes in it, it is likely to be a useful signature for identifying, assessing, predicting or treating a disease.
As a side note, this benchmarking process is why members may have experienced shorter or longer than usual runtimes over the past several months. The core algorithm of the Mapping Cancer Markers engine, used to evaluate each potential gene signature, has a processing time that is highly dependent on the statistical characteristics of each signature. The search space targeted by a single work unit can sometimes contain time-consuming signatures, which together lead to a longer total runtime. This also means variability with the size of Mapping Cancer Markers results. A typical work unit will evaluate tens of thousands of potential gene signatures, many of which are of low quality. Signatures below a certain quality threshold are removed from the returned results. However, the search space targeted by a single work unit can sometimes contain a high proportion of high-quality gene signatures. If this happens, the result file is larger than usual.
Funding & Fundraising
We’re happy to report that there are several potential sources for further funding. Applications are in progress with the Ontario Research Fund, the Canada Foundation for Innovation, and the US Department of Defense. Of course, the free computing power provided by World Community Grid volunteers is absolutely essential to our research. However, additional funding will help us to both leverage contributions from volunteers, and fully utilize findings of the Mapping Cancer Markers computations, with a primary focus on lung and ovarian cancer.
Finally, if you will be in Ontario between 15-17 August, please consider donating to, or cheering on the Team Ian Ride from Kingston to Montreal, which raises money for the Ian Lawson Van Toch Cancer Informatics Fund at the Princess Margaret Cancer Centre (if you are interested, please contact us about joining the Team Ian ride this or next year). If you can join us, it will give you the chance to meet some of the research team, as well as raise money for a worthy cause and participate in an outstanding event. For more details visit: http://www.team-ian.org/