Help Stop TB techniques

Help Stop TB scientists give us an overview of the algorithms used for the project, and the reasons behind their choice.

Hardware, OS, Software, languages being used to #HelpStopTB.

For the scope of our project, we’re making use of the WCG servers alongside our local machines. These little powerhouses (all named after Australian icons!) run on Linux, and each have between 10 and 20 cores, and 64 to 128GB of RAM. Most scientific computing takes place on Linux now, and this is especially important to us as we need it to work with GROMACS, the Molecular Dynamics software we use to simulate the Mycolic Acids on the WCG. A lot of our scripts are either coded in Perl or Python, and we’re on a slow process of moving more processes over to Python. Python has proven really useful for recent analysis – due to its open-source nature and popularity, there’s lots of support for developing the bespoke tools and training needed for the Mycolic Acid data.

Some of the techniques we’re using.

From the size of the data, and the efforts of the team so far, it’s clear that many of the techniques you’d traditionally use to study Molecular Dynamics aren’t fit for purpose on a dataset as large as we have (especially with molecules as bendy as our Mycolic Acids!). To analyze the data, we need to employ machine learning algorithms capable of telling us why the Mycolic Acids are folding into certain shapes, but it’s not quite as simple as throwing any old AI at the problem. To extract this crucial folding information from the data, we need explainable AI tools without the black-box limitations of many approaches. This is where our new team member Connor comes in: with experience in AI, scientific computing and method development, Connor has been looking at ways of integrating information from machine learning with more underutilized tools from mathematics to give a fuller picture into why the acids behave like they do.

We’ve had some local successes too! We had a bit of a hunch that we were looking at nonlinear data, and so we started to employ a competitive learning technique called Self-Organising Maps that act as a nonlinear alternative to less effective linear methods. Additionally, we’ve made use of the mathematical work of Lyapunov (who’s wife sadly passed from Tuberculosis in 1918) to begin understanding the hidden properties of the Mycolic Acids in even greater detail. The next step with these approaches is increasing the scale from our test systems to the full dataset. Unfortunately, this takes time to develop and implement on something of this size, but we continue to make steady progress.