Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jan 5;52(D1):D368-D375.
doi: 10.1093/nar/gkad1011.

AlphaFold Protein Structure Database in 2024: providing structure coverage for over 214 million protein sequences

Affiliations

AlphaFold Protein Structure Database in 2024: providing structure coverage for over 214 million protein sequences

Mihaly Varadi et al. Nucleic Acids Res. .

Abstract

The AlphaFold Database Protein Structure Database (AlphaFold DB, https://alphafold.ebi.ac.uk) has significantly impacted structural biology by amassing over 214 million predicted protein structures, expanding from the initial 300k structures released in 2021. Enabled by the groundbreaking AlphaFold2 artificial intelligence (AI) system, the predictions archived in AlphaFold DB have been integrated into primary data resources such as PDB, UniProt, Ensembl, InterPro and MobiDB. Our manuscript details subsequent enhancements in data archiving, covering successive releases encompassing model organisms, global health proteomes, Swiss-Prot integration, and a host of curated protein datasets. We detail the data access mechanisms of AlphaFold DB, from direct file access via FTP to advanced queries using Google Cloud Public Datasets and the programmatic access endpoints of the database. We also discuss the improvements and services added since its initial release, including enhancements to the Predicted Aligned Error viewer, customisation options for the 3D viewer, and improvements in the search engine of AlphaFold DB.

Plain language summary

The AlphaFold Protein Structure Database (AlphaFold DB) is a massive digital library of predicted protein structures, with over 214 million entries, marking a 500-times expansion in size since its initial release in 2021. The structures are predicted using Google DeepMind's AlphaFold 2 artificial intelligence (AI) system. Our new report highlights the latest updates we have made to this database. We have added more data on specific organisms and proteins related to global health and expanded to cover almost the complete UniProt database, a primary data resource of protein sequences. We also made it easier for our users to access the data by directly downloading files or using advanced cloud-based tools. Finally, we have also improved how users view and search through these protein structures, making the user experience smoother and more informative. In short, AlphaFold DB has been growing rapidly and has become more user-friendly and robust to support the broader scientific community.

PubMed Disclaimer

Figures

Graphical Abstract
Graphical Abstract
Figure 1.
Figure 1.
The expansion of AlphaFold DB. The AlphaFold Protein Structure Database increased in size through consecutive releases. As of September 2023, it archives over 214 million predicted protein structures.
Figure 2.
Figure 2.
Improve search results UI. The improved search UI includes more filtering options based on the underlying sequence of the protein structures and easier access to the most popular organisms.
Figure 3.
Figure 3.
Sequence similarity search results. We added support for performing sequence similarity searches in AlphaFold DB. The search results page displays a list of predicted structures with sequences similar to the user's.
Figure 4.
Figure 4.
Structure similarity cluster members. Using data from AFDB Clusters, we display lists of AlphaFold predictions structurally similar to a protein of interest.
Figure 5.
Figure 5.
Improved support for highlighting non-consecutive regions. The new version of the interactive PAE viewer makes it easier to distinguish between highlighted non-consecutive regions when assessing their relative positions' confidence, as shown for AlphaFold DB accession https://alphafold.ebi.ac.uk/entry/Q7RTU9.
Figure 6.
Figure 6.
Improved customisation in Mol*. Enhanced customisation options in Mol* allow users to perform popular actions such as measuring distances or changing the rendering style.

References

    1. Jumper J., Evans R., Pritzel A., Green T., Figurnov M., Ronneberger O., Tunyasuvunakool K., Bates R., Žídek A., Potapenko A.et al. .. Highly accurate protein structure prediction with AlphaFold. Nature. 2021; 596:583–589. - PMC - PubMed
    1. Baek M., DiMaio F., Anishchenko I., Dauparas J., Ovchinnikov S., Lee G.R., Wang J., Cong Q., Kinch L.N., Schaeffer R.D.et al. .. Accurate prediction of protein structures and interactions using a three-track neural network. Science. 2021; 373:871–876. - PMC - PubMed
    1. Ahdritz G., Bouatta N., Floristean C., Kadyan S., Xia Q., Gerecke W., O’Donnell T.J., Berenberg D., Fisk I., Zanichelli N.et al. .. OpenFold: retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization Bioinformatics. 2022; bioRxiv doi:22 November 2022, preprint: not peer reviewed10.1101/2022.11.20.517210. - DOI - PMC - PubMed
    1. Kryshtafovych A., Schwede T., Topf M., Fidelis K., Moult J.. Critical assessment of methods of protein structure prediction (CASP)-Round XIV. Proteins. 2021; 89:1607–1617. - PMC - PubMed
    1. Velankar S., Burley S.K., Kurisu G., Hoch J.C., Markley J.L.. The Protein Data Bank Archive. Methods Mol. Biol. Clifton NJ. 2021; 2305:3–21. - PubMed