Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Nov 20;13(1):20283.
doi: 10.1038/s41598-023-47204-7.

How AlphaFold2 shaped the structural coverage of the human transmembrane proteome

Affiliations

How AlphaFold2 shaped the structural coverage of the human transmembrane proteome

Márton A Jambrich et al. Sci Rep. .

Abstract

AlphaFold2 (AF2) provides a 3D structure for every known or predicted protein, opening up new prospects for virtually every field in structural biology. However, working with transmembrane protein molecules pose a notorious challenge for scientists, resulting in a limited number of experimentally determined structures. Consequently, algorithms trained on this finite training set also face difficulties. To address this issue, we recently launched the TmAlphaFold database, where predicted AlphaFold2 structures are embedded into the membrane plane and a quality assessment (plausibility of the membrane-embedded structure) is provided for each prediction using geometrical evaluation. In this paper, we analyze how AF2 has improved the structural coverage of membrane proteins compared to earlier years when only experimental structures were available, and high-throughput structure prediction was greatly limited. We also evaluate how AF2 can be used to search for (distant) homologs in highly diverse protein families. By combining quality assessment and homology search, we can pinpoint protein families where AF2 accuracy is still limited, and experimental structure determination would be desirable.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Evaluation of transmembrane AlphaFold2 structures: (A) Correlation between different filters used by TMAlphaFold on the 3D_set/topography_set (F1: DetectingMembranePlane, F2: Signal, F3: FullStructure, F4: ShortHelix, F5: Masked, F6: MissingTmpart, F7: Domain, F8: OverpredictCctop, F9: UnderpredictCctop, F10: MembranePlaneCctop). Darker shade of blue means higher correlation. For an explanation of these filters, see the main text; (B) Performance of different filters on the 3D_set/topography_set; (C) Relative number of proteins and the measured accuracy (by checking topography) at different quality levels on the 3D_set/topography_set; (D) Distribution of TMAlphaFold quality levels at different RMSD distances (in Å) between the experimental and model template in the 3D_set; (E) Distribution of TMAlphaFold quality levels at different TM-Score groups in the 3D_set; (F) Distribution of pLDDT values in the transmembrane regions at different TMAlphaFold quality levels on the 3D_set.
Figure 2
Figure 2
Coverage of the human transmembrane proteome: (A) Structural coverage of TM proteins (from 2000 to 2021) according to their sequence identity to PDB entries and TMAlphaFold (2022) structures based on their quality level; (B) Distribution of TMAlphaFold quality levels on proteins with and without homologous structures and also based on experimental technique by which the template structure has been determined; (C) Distribution the number of TMAlphaFold structures at different quality levels (3D_set/topography_set results are also displayed for comparison); (D) Distribution of pLDDT values in the HTP predicted transmembrane regions at different TMAlphaFold quality levels; (E) Distribution of pLDDT values in the TMAlphaFold detected transmembrane regions at different TMAlphaFold quality levels.
Figure 3
Figure 3
Sequence and structure based search and clustering results: (A) Left: Number of missing proteins (false negative) with “Fair” or worse quality using different approaches. Right: Number of found proteins (true positive) with “Fair” or worse quality using different approaches. (B) Number of proteins in different clusters categorized by protein families (GR1: Single helix bin, GR2: Integrin, GR3: KCN_3, GR4: Eic/Glu me. Domain, GR5: Neurotim ion-channel, GR6: CACN_2, GR7: ABC_2, GR8: Aquaporin, GR9: KCN_1, GR10: Rhodopsin, GR11: Olfactory, GR12 :ATP_1, GR13: SLC_1, GR14: ABC_1, GR15: SLC_2, GR16: ABC_3), using HHBlits and DBScan. (C) Number of proteins in different clusters categorized by protein families, using Foldseek clustering. (D) Number of proteins in different clusters categorized by protein families, using Foldseek and DBScan. (E) Number of proteins in different clusters categorized by protein families, using the Neural Network and DBScan.
Figure 4
Figure 4
GeneOntology terms of bad quality structures: Highly significant terms are sorted based on their level in the GeneOntology tree (blue, green, purple, red: 2–5, respectively) and on fold enrichment. (A) Molecular Function (B) Biological process.
Figure 5
Figure 5
Problematic structures: Examples of badly modeled structures from populated clusters, with a (i) representative structure, (ii) topography comparison and (iii) TMAlphaFold quality level distribution. Top: Spermatogenesis associated proteins (disordered regions are also highlighted using MemDis prediction (green is disordered). Middle: Cation channels. Bottom: Anoctamins.
Figure 6
Figure 6
Fixing problematic structures: (A) Multiple sequence alignment of human Anoctamin-4, Anoctamin-6 and PDB:6P46_A. (B) Top: Electron-microscopy and AF2 predicted structure of Anoctamin-6, Bottom: AF2 predicted structure of Anoctamin-4; AF2 predicted structure of Anoctamin-4 using Anoctamin-6 AF2 structure as a template. Yellow regions show membrane regions as defined by TMDET. Red regions are non-TM regions placed in the plane of the lipid bilayer.

Similar articles

Cited by

References

    1. Dobson L, Reményi I, Tusnády GE. The human transmembrane proteome. Biol. Direct. 2015;10:31. doi: 10.1186/s13062-015-0061-x. - DOI - PMC - PubMed
    1. Bowie JU. Solving the membrane protein folding problem. Nature. 2005;438:581–589. doi: 10.1038/nature04395. - DOI - PubMed
    1. Kozma D, Simon I, Tusnády GE. PDBTM: Protein Data Bank of transmembrane proteins after 8 years. Nucleic Acids Res. 2013;41:D524–D529. doi: 10.1093/nar/gks1169. - DOI - PMC - PubMed
    1. Varga JK, Tusnády GE. The TMCrys server for supporting crystallization of transmembrane proteins. Bioinformatics. 2019;35:4203–4204. doi: 10.1093/bioinformatics/btz108. - DOI - PMC - PubMed
    1. Dobson L, Reményi I, Tusnády GE. CCTOP: A Consensus Constrained TOPology prediction web server. Nucleic Acids Res. 2015;43:W408–W412. doi: 10.1093/nar/gkv451. - DOI - PMC - PubMed