Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Feb;32(2):e4548.
doi: 10.1002/pro.4548.

DPAM: A domain parser for AlphaFold models

Affiliations

DPAM: A domain parser for AlphaFold models

Jing Zhang et al. Protein Sci. 2023 Feb.

Abstract

The recent breakthroughs in structure prediction, where methods such as AlphaFold demonstrated near-atomic accuracy, herald a paradigm shift in structural biology. The 200 million high-accuracy models released in the AlphaFold Database are expected to guide protein science in the coming decades. Partitioning these AlphaFold models into domains and assigning them to an evolutionary hierarchy provide an efficient way to gain functional insights into proteins. However, classifying such a large number of predicted structures challenges the infrastructure of current structure classifications, including our Evolutionary Classification of protein Domains (ECOD). Better computational tools are urgently needed to parse and classify domains from AlphaFold models automatically. Here we present a Domain Parser for AlphaFold Models (DPAM) that can automatically recognize globular domains from these models based on inter-residue distances in 3D structures, predicted aligned errors, and ECOD domains found by sequence (HHsuite) and structural (Dali) similarity searches. Based on a benchmark of 18,759 AlphaFold models, we demonstrate that DPAM can recognize 98.8% of domains and assign correct boundaries for 87.5%, significantly outperforming structure-based domain parsers and homology-based domain assignment using ECOD domains found by HHsuite or Dali. Application of DPAM to the massive AlphaFold models will enable efficient classification of domains, providing evolutionary contexts and facilitating functional studies.

Keywords: domain classification; domain parser; protein domains; structural predictions.

PubMed Disclaimer

Figures

FIGURE 1
FIGURE 1
Evidence to parse an AF model into globular domains. (a) An example AF model (UniProt accession: Q9ZFH0). (b) PAE plot for the same model. As the PAE value increases, the color changes from dark green to light green. Residues in the blue, yellow, orange, and red circles in (a) correspond to the blue, yellow, orange, and red squares in (b) with lower PAE values inside. (c) Minimal inter‐residue distance plot for the same model. The color changes from dark blue to light blue as distance increases. (d) Similar sequences detected by HHsuite. (e) Similar structures detected by Dali. We aligned an ECOD hit to a query in an iterative fashion to allow the detection of duplicated domains in the query
FIGURE 2
FIGURE 2
The probabilities for a residue pair to be in the same domain are derived from four parameters: (a) PAE, (b) inter‐residue distance, (c) HHsuite support, and (d) Dali support. The values for these parameters were binned, and probability was calculated as the fraction of residue pairs to be in the same domain in each bin based on our benchmark
FIGURE 3
FIGURE 3
Illustration of the DPAM method. Eight 5‐residue segments of a protein are clustered into two domains based on a consensus of PAEs, inter‐residue distances, similar domains found by HHsuite, and similar domains found by Dali
FIGURE 4
FIGURE 4
Performance evaluation of DPAM against existing structure‐based domain parsers (PDP and PUU) and assignment based on similar ECOD domains found by sequence (HHS: HHsuite) and structure similarity searches (Dali). (a) The fraction of ECOD domains covered in domains annotated by different methods. (b) The fraction of residues covered in domains annotated by different methods. (c) The fraction of domains whose boundaries were correctly predicted by different methods
FIGURE 5
FIGURE 5
Examples of parsed domains in AF models by DPAM. Different domains in a protein are colored from blue (or purple, N‐terminal) through green, yellow, to red (or magenta, C‐terminal). Non‐domain regions are colored in gray. (a–e) cases where DPAM domain definitions agree with ECOD definitions. (f) a case where DPAM domain definitions are not all accurate, and domains that are incorrectly split are in blue circles. (g) a case where DPAM missed some poorly modeled zinc fingers (in a green circle) and combined multiple consecutive zinc fingers (in orange circles). (H) a case where the boundaries of DPAM domains differ from ECOD domain boundaries, but the DPAM domains appear more meaningful

References

    1. Alexandrov N, Shindyalov I. PDP: protein domain parser. Bioinformatics. 2003;19(3):429–30. - PubMed
    1. Andreeva A, Kulesha E, Gough J, Murzin AG. The SCOP database in 2020: expanded classification of representative family and superfamily domains of known protein structures. Nucleic Acids Res. 2020;48(D1):D376–82. - PMC - PubMed
    1. Apweiler R, Attwood TK, Bairoch A, Bateman A, Birney E, Biswas M, et al. The InterPro database, an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Res. 2001;29(1):37–40. - PMC - PubMed
    1. Ayoub R, Lee Y. RUPEE: a fast and accurate purely geometric protein structure search. PLoS One. 2019;14(3):e0213712. - PMC - PubMed
    1. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, et al. The Protein Data Bank. Nucleic Acids Res. 2000;28(1):235–42. - PMC - PubMed

Publication types