New methods and processes help the research team process World Community Grid data more efficiently and provide more accurate docking techniques.
As the volume of data generated by World Community Grid volunteers for our FightAIDS@Home (FAAH) project has increased, so has our need to optimize how we handle and store that data. In this project update, we discuss new improvements in how we process the extremely high result data rate you generate, which is allowing us to focus more resources toward the analysis of FAAH data. Further, improved docking techniques are being created and applied from the results of deeper analysis coupled with ongoing experimental data from our collaborators.
Processing your results faster
Managing the very large data throughput generated by World Community Grid volunteers for FAAH is a great challenge. Beside the scientific results we have achieved over the years, we also have developed novel software and protocols to process, analyze and store the results you generate quickly and efficiently.
Recently, we exploited the parallel computational resources available at Scripps. In the last few months, we have shifted our processing of the incoming World Community Grid data to our local High Performance Computing cluster, Garibaldi. Since the implementation of the AutoDock Vina software for FAAH last year, you have generated several terabytes of compressed docking results each month, which was putting a strain on our storage system. Until recently, most of our work and resources have been focused on processing this data to make it suitable for deeper analysis. We had to devote most of our local computational power to this processing. With our new methods, we have increased the processing rate by several orders of magnitude with the use of multiple processors and the optimization of processing scripts. Processing a batch that used to take between 30 minutes to few hours now takes just a few minutes. Streamlined scripts and parallel processing has yielded 180,000 processed batches in two weeks.
We have created new analysis programs using structural and statistical methods to mine more information from the results you generate. Statistical analysis tools will first be used to reduce over 5 million docked compounds to a few thousand top-ranking candidates. Structural information will then be used to cull the list further by filtering for key intermolecular interactions and against unfavorable interactions. A new database structure that will incorporate these programs is being developed to handle this large and fast-growing flood of results. Once optimized, the whole processing and analysis workflow will be fully automated.
Importantly, what we have learned and are learning from these refined methods to handle big data will be made available in the AutoDockTools suite, which is utilized by many research labs worldwide.
Improved protein-ligand binding modeling capabilities
Proteins are typically large molecules and often can bend or flex in various ways at various points and at normal temperatures they rapidly bend to many or all of the possible configurations (bent shapes). When searching for ligands that might attach to a protein target, the ligand might not match the shape of the protein in one of its configurations, but might match in another configuration of the protein. By considering more configurations of the protein, it is more likely that a ligand can be found which matches one of the protein's configurations. Since February 2014, we have been running flexible receptor side-chain Vina jobs on FAAH, which we expect to enhance our docking results. While our typical docking methods hold the protein structure rigid, the flexibility feature in AutoDock Vina allows selected residue side chain conformations to be sampled along with the flexible ligand molecule. This enables the protein pocket to adopt alternate shapes to better model protein-ligand binding and the so-called “induced fit”, minimizing the bias of using a rigid target structure. Currently, we are testing this approach on several sites (LEFGF, FBP, and Y3) in HIV integrase.
The downside of performing flexible receptor calculations is that the search complexity increases, and computing run-times are therefore 5 to 10 times longer. The World Community Grid staff has been adjusting their methods to account for the different Flexible Vina work unit. Once these dockings have finished and the analyses performed, we will be able to optimize our application of Flexible Vina on World Community Grid and extend it to other targets.
Another way to minimize rigid-protein bias in traditional docking is to dock to an ensemble of protein structures. Two ways to generate these ensembles, both used in FAAH dockings, are molecular dynamics (MD) simulations and simply using multiple available structures for a given protein receptor. The last hundred experiments have included ensembles ranging from tens to sometimes hundreds of receptor structures. Ensembles add another layer of analysis with the goal of achieving a more accurate ranking of compounds from several sources of data.
Despite the encouraging results on the first hits previously reported, we are encountering experimental issues that are making the process of identifying hits very challenging. As often happens in science (and particularly in HIV-related experiments!), it is hard to achieve robust and consistent statistics from biological assays.
Experiment 30 Compounds (October-December 2009), Target: HIV Protease, Exo/1F1 Sites:
Five out of ten compounds had promising results from a differential scanning fluorimetry (DSF) assay, performed by the Torbett Lab. Unfortunately. X-ray crystallography by the Stout Lab gave inconclusive data; crystals had formed but diffracted poorly, so no binding sites were confirmed. The compounds were sent to our collaborators at Scripps, Florida, but complications with producing enough HIV Protease delayed these efforts. This obstacle was recently resolved with the help of the Elder Lab, and nuclear magnetic resonance (NMR) experiments are soon to be performed.
Experiment 33 (June-December 2010), Target: HIV Integrase, Active Site of the Catalytic Core Domain (CCD):
Preliminary results were mixed. The Kvaratskhelia Lab (OSU) recently reported promising results for 2 out of 10 compounds, but these compounds were considered poor candidates due to poor chemical properties that indicated poor specificity, meaning that although they may bind to HIV Integrase, they will probably bind just as easily to other proteins reducing their effectiveness.
We anticipate identifying many more hit compounds for all 3 proteins and their various sites by the end of the year and we’re grateful for World Community Grid volunteers for giving us the opportunity to learn more about HIV and its interactions.