Machine Learning

New Machine Learning Algorithms Accelerate Drug Discovery on Desktop Computers

2 Apr 2017 6:00am, by

In the past, the process of discovering new, life-saving drugs was one of trial and error, sometimes even brought on by a stroke of random luck — just ask biologist Alexander Fleming, whose untidy lab offered the right environment to cultivate the “mold juice” that would eventually become the world’s first antibiotic, penicillin.

While new technologies in recent decades have made the process easier, machine learning has the potential to help humanity make even larger leaps in medicine. From recommendation engines to language translating apps, network security, database science, or even wildlife conservation, machine learning is already all around us, and the medical field is but one of many that are benefitting from the new technological paradigm that’s emerging.

Now, researchers at the University of Toronto have used machine learning to develop new algorithms that can generate accurate three-dimensional structural models of small protein molecules. This is a big deal because this technology will help scientists to better understand the complex structure of protein molecules, how drug molecules can affect them, as well as furthering our understanding of how life operates at these tiny scales.

Accelerating the Drug Design Process

Since drugs function by having their molecules bind to protein molecules in the body, drug design focuses on creating molecules that will only bind to a certain protein, so knowledge of the structures of these minuscule proteins is vital. The problem here is that it’s not always crystal clear what these structures look like: they are unimaginably tiny, measuring anywhere from 1 to 100 nanometers — or less than the shortest wavelength of visible light.

“Designing successful drugs is like solving a puzzle,” said University of Toronto Ph.D. student Ali Punjani, one of the researchers who developed the algorithms, which were detailed in the journal Nature Methods. “Without knowing the three-dimensional shape of a protein, it would be like trying to solve that puzzle with a blindfold on.”

Conventional methods of observing these nanoscale structures include cryo-electron microscopy (or cryo-EM), which involves using high-powered microscopes to capture thousands of images of frozen protein specimens from different angles, and then stitching these images into a three-dimensional model. But it’s a time-consuming process: it may take up to half a million CPU hours, or many days or weeks, to come up with a high-resolution 3D model of a single protein.

What’s even more troublesome is that sometimes, these existing techniques can also produce incorrect structural models, which must be then corrected by someone who knows what they are looking for.

“Processing cryo-EM image data to reveal heterogeneity in the protein structure and to refine 3-D maps to high resolution frequently becomes a severe bottleneck,” the team explains in their paper. “[It requires] expert intervention, prior structural knowledge, and weeks of calculations on expensive computer clusters.”

In contrast, the team’s new method would be capable of generating a model in mere minutes on a desktop computer with a single GPU, meaning that operating costs and the time it takes to map proteins and discover new drugs would be drastically reduced. Best of all, that previous structural knowledge of nanoscale proteins would not be as necessary either. In addition, the algorithm is also designed to allow for the automated analysis and discovery of unexpected structures.

“We hope this will allow discoveries to happen at a ground-breaking pace in structural biology,” said Punjani. “The ultimate goal is that it will directly lead to new drug candidates for diseases, and a much deeper understanding of how life works at the atomic level.”

The team has already launched a cryo-EM platform called cryoSPARC that uses the new algorithms, via their startup, Structura Biotechnology. Several labs are already using the software, but the aim is to have labs across the world utilize the user-friendly product to tease out the complexities of proteins.


Using digital tools to automate drug discovery to some extent would no doubt ease what is now a costly and time-consuming process. A 2014 report found that it takes 12 to 14 years and $2.6 billion on average to get a new drug to market, with a majority of drugs failing to make it through trials. Computer-aided automation would not only increase efficiency and reduce these costs significantly, but these platforms could also be trained to potentially connect the dots between the clinical data that’s out there, opening up greater possibilities that humans might have missed on their own.

Images: University of Toronto.

A newsletter digest of the week’s most important stories & analyses.