- Researchers use a deep learning model to determine the 3D structure of a protein based on its amino acid sequence.
- Once completely trained, it can easily outperform all existing techniques at estimating protein structures for which there is no prior knowledge.
Protein is one of the major building blocks of the human body. It builds and maintains tissue. Chemically, it is composed of amino acids – organic compounds made of hydrogen, carbon, oxygen, nitrogen or sulfur.
Proteins carry out almost all fundamental biological processes essential for life by folding themselves into accurate three-dimensional structures that control their interaction with other molecules.
Since the shape of a protein determines its function and role in various diseases, it is important to study and predict their structures to develop life-saving and life-changing medicines.
However, it is not as easy as it sounds. Over the last 5 decades, protein folding has remained one of the most challenging problems for biochemists. Numerous computational methods have been developed, especially in recent years, to predict how proteins fold but an explicit sequence-to-structure map hasn’t been achieved yet.
Now, researchers at Harvard Medical School have used deep learning model (a form of artificial intelligence) to determine the 3D structure of a protein based on its amino acid sequence. It outperforms existing state-of-the-art techniques by 6 to 7 orders of magnitude in terms of speed.
Applying End-to-End Differentiable Deep Learning
The advanced algorithms use a brute force technique to simulate the complex physics of amino acid interactions and determine protein structure. To decrease computational overhead, these algorithms map new sequences onto pre-designed templates that represent previously determined protein structures.
Some AI projects like Google’s AlphaFold parse a massive amount of genomic data containing the blueprint of protein sequences. However, these methods do not estimate structures based solely on the amino acid sequence. They cannot determine the evolutionary unique proteins (structures of proteins that were never studied in the past).
Therefore, the research team used an end-to-end differentiable deep learning technique, which is already proven effective in some of the most popular apps, including Google Translate and Apple’s Siri.
This deep learning system called recurrent geometric network emphasizes on key properties of protein folding. It is trained on thousands of predetermined protein sequences and structures.
For every single amino acid, the algorithm calculates the chemical bonds’ angle that link the acid with its neighbors, as well as the angle of rotation around these chemical bonds.
A visual simulation of how the network computes the chemical bonds’ angle and angle of rotation around these bonds to construct a structure of the protein. | Credit: Mohammed AlQuraishi
The neural network performs these calculations (each iteration is refined by the relative location of every other amino acid) until the structure is finished. The system then checks the preciseness of its outcome by matching it against the real protein structure (obtain from direct observations).
This process is repeated for several different known proteins and the accuracy of the system increases with each iteration. It could take months to train the network, but once the training is complete, the model can easily outperform all existing techniques at estimating protein structures for which there is no prior knowledge.
Still, the accuracy of the model is not sufficient enough to resolve the full atomic structure of a protein. Thus, it is not ready for use in drug design or discovery.
For now, it can complement other techniques to predict a much wider class of protein structures than previously possible. There are numerous opportunities to improve the model by integrating laws of physics and chemistry. If you want to try it yourself, the code and results are available on GitHub.