Approximately 300 million small molecules run for OpenPandemics - COVID-19 as part of system test


The recent stress test run on World Community Grid allowed the researchers to quickly run simulations for 300 million small molecules.



Background

OpenPandemics - COVID-19 was created to help accelerate the search for potential COVID-19 treatments. The project also aims to build a fast-response, open-source toolkit that can help all scientists quickly search for treatments in the event of future pandemics. 

In late 2020, we announced the selection of 70 compounds (from an original group of approximately 20,000) that could be promising to be investigated as potential inhibitors of the virus that causes COVID-19. Lab testing is currently underway for some of these compounds (see the end of this report for details).

In late April and early May, we provided World Community Grid with approximately 30,000 batches of GPU work units. This was part of a stress test of the World Community Grid infrastructure and analysis work flow, and quickly generated an extremely large amount of data for us. 

What did we learn from the recent stress test?

The stress test was a great exercise to uncover bottlenecks in our workflow. Because of the almost unbelievable magnitude of results returned—the equivalent of about 3/4 of the number of CPU results for one year in one week—it became apparent to us that the major bottleneck was what we internally call "rehydration/analysis." This is the step where we convert the so-called "genome" describing the location, rotation, and torsion state of a given docking result into xyz atom coordinates and perform the analysis. 

The stress test motivated us to develop considerable optimizations in our code for the GPU version. These optimizations sped up rehydration/analysis by more than ten-fold, which led to an overall speedup of 5x of our workflow. These optimizations are lined up to be incorporated in our mainstream code source in the AutoDock-GPU GitHub page and will be available to the whole community, benefiting all the researchers that use our code for their simulations.

Targets currently running on World Community Grid

Currently, all computation is focusing on the spike protein of the SARS-CoV2 virus. The first work units targeting the spike were the "stress test," which docked about 300 million small molecules against one of many possible binding sites. Subsequently, we targeted multiple possible binding pockets with both reactive and non-reactive molecules. 

The reactive molecules contain a chemical group capable of reacting selectively with either tyrosine or lysine amino acids (which are common building blocks of proteins) using a particular kind of sulfur chemistry (sulfonyl fluoride exchange, SuFEx). If any of these molecules really does bind to the spike protein, it could interfere with viral entry into human cells and, in turn, slow down the replication of the virus. 

Ongoing compound testing

In our analysis, we filtered raw docking results to identify the most promising compounds to be synthesized and tested in biological assays. During this process, the number of results was reduced from hundreds of millions of molecules to a few dozen that showed the most interesting interaction patterns with the viral enzymes. 

With our collaborators at Enamine, we identified those more accessible through synthetic chemistry, ultimately selecting molecules that could target two of the main proteases of the SARS-CoV2 virus: 28 for the protease Plpro, and 47 for the protease Mpro. Enamine rapidly synthesized, purified, and shipped the molecules to our experimental collaborators laboratories at Scripps Florida (Griffin and Kojetin labs) and Emory University (Sarafianos lab). As soon as biological results are available, we will share them with the community.

Thank you to everyone who is supporting this project!