These exquisite, intricate machines are proteins. They underpin not just the biological processes in your body but every biological process in every living thing. They’re the building blocks of life.
Currently, there are over 200 million known proteins, with many more found every year. Each one has a unique 3D shape that determines how it works and what it does.
But figuring out the exact structure of a protein remains an expensive and often time-consuming process – and until now – scientists have only been able to study the exact 3D structure of a tiny fraction of the proteins known to science.
Finding ways to close this rapidly expanding gap and predict the structure of millions of unknown proteins can not only help us tackle disease, and more quickly find new medicines, but perhaps, also unlock the mysteries of how life itself works.
These sequences are assembled according to the genetic instructions of an organism's DNA.
Attraction and repulsion between the 20 different types of amino acids cause the string to fold in a feat of ‘spontaneous origami’, forming the intricate curls, loops, and pleats of a protein’s 3D structure.
For decades, scientists have been trying to find a method to reliably determine a protein’s structure just from its sequence of amino acids.
This grand scientific challenge is known as the protein-folding problem.
It was taught by showing it the sequences and structures of around 100,000 known proteins.
Experimental techniques for determining structures have been painstakingly laborious and time consuming (sometimes taking years and millions of dollars).
Our latest system can now predict the shape of a protein, at scale and in minutes, down to atomic accuracy.
This is a significant breakthrough and highlights the impact AI can have on science.
CASP is a community forum that allows researchers to share progress on the protein-folding problem. The community also organises a biennial challenge for research groups to test the accuracy of their predictions against real experimental data.
Teams are given a selection of amino acid sequences for proteins which have had their exact 3D shape mapped but have not yet been released into the public domain. Groups must submit their best predictions to see how close they are to the subsequently revealed structures.
Among the teams that participated in CASP13 (2018), AlphaFold placed first in the protein structure prediction challenge. At CASP14 (2020), we presented our latest version of AlphaFold, which has now reached a level of accuracy considered to solve the protein structure prediction problem.
Our work builds upon decades of research by CASP’s organisers and the protein-folding community, and we’re indebted to the countless number of people who have contributed protein structures over the years, making such rigorous evaluations possible.
The AlphaFold Protein Structure Database, created in partnership with Europe’s flagship laboratory for life sciences (EMBL’s European Bioinformatics Institute), builds upon decades of painstaking work done by scientists, using traditional methods to determine the structure of proteins.
Our first release, on 22 July, 2021, covers over 350,000 structures, including the human proteome – all of the ~20,000 known proteins expressed in the human body – along with the proteomes of 20 additional organisms important for biological research, including yeast, the fruit fly, and the mouse.
These organisms are central to modern biological research, including Nobel Prize winning discoveries and life-saving drug development.
This release dramatically expanded our knowledge of protein structures and more than doubled the number of high-accuracy human protein structures available to scientists around the world.
Our latest release, announced on 28 July, 2022, expands this database from nearly 1 million structures to over 200 million structures – including nearly all catalogued proteins known to science.
Read the blog about our latest release.
Our partners are already using AlphaFold to accelerate progress on important real-world problems.
For instance, the Drugs for Neglected Diseases initiative (DNDi) is advancing drug discovery for neglected diseases, such as Chagas disease and leishmaniasis, which impact millions within poor and vulnerable communities.
Meanwhile, a scientist at the life and material science company, Schrödinger, is searching for ways to improve medicine by creating selective drugs, which can focus on one target rather than many.
At the Centre for Enzyme Innovation (CEI), researchers are discovering and engineering enzymes for breaking down single-use plastics, while teams from universities across Norway and the USA mapped the structure of honey bee Vitellogenin (Vg), a central protein for understanding the immune systems of egg-laying animals.
Looking at how changes in our DNA result in changes in our traits, a professor at ETH Zurich is studying the evolution of proteins. And at the University of Colorado, Boulder, another team is studying antibiotic resistance, a problem which causes 2.8M infections in the US alone each year.