DPAM: A domain parser for AlphaFold models

Jing Zhang^{1

2

3}, R Dustin Schaeffer², Jesse Durham^{1

2

3}, Qian Cong^{1

2

3}, Nick V Grishin^{2

4}

Affiliations

¹ Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, Texas, USA.
² Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, USA.
³ Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, Texas, USA.
⁴ Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas, USA.

PMID: 36539305
PMCID: PMC9850437
DOI: 10.1002/pro.4548

DPAM: A domain parser for AlphaFold models

Jing Zhang et al. Protein Sci. 2023 Feb.

. 2023 Feb;32(2):e4548.

doi: 10.1002/pro.4548.

Authors

Jing Zhang^{1

2

3}, R Dustin Schaeffer², Jesse Durham^{1

2

3}, Qian Cong^{1

2

3}, Nick V Grishin^{2

4}

Affiliations

¹ Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, Texas, USA.
² Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, USA.
³ Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, Texas, USA.
⁴ Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas, USA.

PMID: 36539305
PMCID: PMC9850437
DOI: 10.1002/pro.4548

Abstract

The recent breakthroughs in structure prediction, where methods such as AlphaFold demonstrated near-atomic accuracy, herald a paradigm shift in structural biology. The 200 million high-accuracy models released in the AlphaFold Database are expected to guide protein science in the coming decades. Partitioning these AlphaFold models into domains and assigning them to an evolutionary hierarchy provide an efficient way to gain functional insights into proteins. However, classifying such a large number of predicted structures challenges the infrastructure of current structure classifications, including our Evolutionary Classification of protein Domains (ECOD). Better computational tools are urgently needed to parse and classify domains from AlphaFold models automatically. Here we present a Domain Parser for AlphaFold Models (DPAM) that can automatically recognize globular domains from these models based on inter-residue distances in 3D structures, predicted aligned errors, and ECOD domains found by sequence (HHsuite) and structural (Dali) similarity searches. Based on a benchmark of 18,759 AlphaFold models, we demonstrate that DPAM can recognize 98.8% of domains and assign correct boundaries for 87.5%, significantly outperforming structure-based domain parsers and homology-based domain assignment using ECOD domains found by HHsuite or Dali. Application of DPAM to the massive AlphaFold models will enable efficient classification of domains, providing evolutionary contexts and facilitating functional studies.

Keywords: domain classification; domain parser; protein domains; structural predictions.

PubMed Disclaimer

Figures

**FIGURE 1**
Evidence to parse an AF model into globular domains. (a) An example AF model (UniProt accession: Q9ZFH0). (b) PAE plot for the same model. As the PAE value increases, the color changes from dark green to light green. Residues in the blue, yellow, orange, and red circles in (a) correspond to the blue, yellow, orange, and red squares in (b) with lower PAE values inside. (c) Minimal inter‐residue distance plot for the same model. The color changes from dark blue to light blue as distance increases. (d) Similar sequences detected by HHsuite. (e) Similar structures detected by Dali. We aligned an ECOD hit to a query in an iterative fashion to allow the detection of duplicated domains in the query

**FIGURE 2**
The probabilities for a residue pair to be in the same domain are derived from four parameters: (a) PAE, (b) inter‐residue distance, (c) HHsuite support, and (d) Dali support. The values for these parameters were binned, and probability was calculated as the fraction of residue pairs to be in the same domain in each bin based on our benchmark

**FIGURE 3**
Illustration of the DPAM method. Eight 5‐residue segments of a protein are clustered into two domains based on a consensus of PAEs, inter‐residue distances, similar domains found by HHsuite, and similar domains found by Dali

**FIGURE 4**
Performance evaluation of DPAM against existing structure‐based domain parsers (PDP and PUU) and assignment based on similar ECOD domains found by sequence (HHS: HHsuite) and structure similarity searches (Dali). (a) The fraction of ECOD domains covered in domains annotated by different methods. (b) The fraction of residues covered in domains annotated by different methods. (c) The fraction of domains whose boundaries were correctly predicted by different methods

**FIGURE 5**
Examples of parsed domains in AF models by DPAM. Different domains in a protein are colored from blue (or purple, N‐terminal) through green, yellow, to red (or magenta, C‐terminal). Non‐domain regions are colored in gray. (a–e) cases where DPAM domain definitions agree with ECOD definitions. (f) a case where DPAM domain definitions are not all accurate, and domains that are incorrectly split are in blue circles. (g) a case where DPAM missed some poorly modeled zinc fingers (in a green circle) and combined multiple consecutive zinc fingers (in orange circles). (H) a case where the boundaries of DPAM domains differ from ECOD domain boundaries, but the DPAM domains appear more meaningful

See this image and copyright information in PMC

References

1. Alexandrov N, Shindyalov I. PDP: protein domain parser. Bioinformatics. 2003;19(3):429–30. - PubMed
1. Andreeva A, Kulesha E, Gough J, Murzin AG. The SCOP database in 2020: expanded classification of representative family and superfamily domains of known protein structures. Nucleic Acids Res. 2020;48(D1):D376–82. - PMC - PubMed
1. Apweiler R, Attwood TK, Bairoch A, Bateman A, Birney E, Biswas M, et al. The InterPro database, an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Res. 2001;29(1):37–40. - PMC - PubMed
1. Ayoub R, Lee Y. RUPEE: a fast and accurate purely geometric protein structure search. PLoS One. 2019;14(3):e0213712. - PMC - PubMed
1. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, et al. The Protein Data Bank. Nucleic Acids Res. 2000;28(1):235–42. - PMC - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

DPAM: A domain parser for AlphaFold models

Affiliations

DPAM: A domain parser for AlphaFold models

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources