Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jan 6;53(D1):D411-D418.
doi: 10.1093/nar/gkae1029.

ECOD: integrating classifications of protein domains from experimental and predicted structures

Affiliations

ECOD: integrating classifications of protein domains from experimental and predicted structures

R Dustin Schaeffer et al. Nucleic Acids Res. .

Abstract

The evolutionary classification of protein domains (ECOD) classifies protein domains using a combination of sequence and structural data (http://prodata.swmed.edu/ecod). Here we present the culmination of our previous efforts at classifying domains from predicted structures, principally from the AlphaFold Database (AFDB), by integrating these domains with our existing classification of PDB structures. This combined classification includes both domains from our previous, purely experimental, classification of domains as well as domains from our provisional classification of 48 proteomes in AFDB predicted from model organisms and organisms of concern to global health. ECOD classifies over 1.8 M domains from over 1000 000 proteins collectively deposited in the PDB and AFDB. Additionally, we have changed the F-group classification reference used for ECOD, deprecating our original ECODf library and instead relying on direct collaboration with the Pfam sequence family database to inform our classification. Pfam provides similar coverage of ECOD with family classification while being more accurate and less redundant. By eliminating duplication of effort, we can improve both classifications. Finally, we discuss the initial deployment of DrugDomain, a database of domain-ligand interactions, on ECOD and discuss future plans.

PubMed Disclaimer

Figures

Graphical Abstract
Graphical Abstract
Figure 1.
Figure 1.
Contribution of AFDB domains to ECOD and its representative clusters. (A) Distribution of curated domains (manual representatives) in AFDB 48 proteomes and ECOD. (B) Distribution of automated non-representative domains in ECOD by AFDB and PDB. (C) Domains source of ECOD cluster representatives for F40, F70 and F99 levels. (D) Cluster composition of FClusters in ECOD. Sequence clusters tend to be predominantly composed solely of predicted structure or experimental models, with a comparatively lower fraction of mixed clusters.
Figure 2.
Figure 2.
Effect of adding Pfam classification and AFDB domains to ECOD classification. (A) Overall percent of ECOD domains classified into sequence family groups (F-groups) in recent ECOD versions. In version 290, ECOD switched from our previous HMM library, ECODf, to directly classifying using Pfam. (B) Top 20 most populated homologous groups in ECOD v292 and the relative number of domains mapped to Pfam (magenta) compared to those lacking an F-group classification (cyan).
Figure 3.
Figure 3.
Structures of Zuotin1 Homology domain (ZHD) from experimental and predicted structures. (A) The AFDB predicted structure of human Zuo1 (UniProtKB: Q99543) consists of chaperone J-domain (red), ZHD (blue), CHMP-3 linker domain (cyan), a ‘C-terminal Pdr1-activating domain of Zuo1’ (green) and two helix-turn-helix domains (orange and purple). Subsequent to ECOD/Pfam definition of ZUO1-like_ZHD (PF21884), numerous other structurally similar examples of ZHD domain were found in AFDB predicted proteins (BE).
Figure 4.
Figure 4.
A new family of SH3 domain repeats in E3 ubiquitin ligases are defined through combined efforts between ECOD and Pfam. (A) Human Mib1 protein contains four SH3-like repeat domains, two of which are defined as Mib-Herc2 (PF06701) and two which were classified as a new SH3 domain family (SH3_15, PF18346). (B) A. thaliana KEG E3 ubiquitin ligase, containing a region with seven SH3_15 repeats (red), ankyrin repeat domain (cyan), protein kinase domain (green) and a RING Znf domain (magenta). (C) FinTRIM 97 (ftr97), a previously uncharacterized zebrafish protein contains six SH3_15 domains (colored regions).
Figure 5.
Figure 5.
ECOD statistics for AlphaFold models with exact AlphaFill small molecules. (A) Distribution of DrugBank molecules interacting with ECOD domains of target AF models. The inside pie shows ECOD architecture groups (A-groups), outside doughnut shows ECOD homology groups (H-groups). (B) ECOD A-groups (left column) and superclasses of organic molecules according to ClassyFire classification (36) (right column). Each superclass and lines pointed toward it are denoted by separate color. The thickness of the lines shows the number of ECOD domains interacting with a particular superclass of organic molecules.
Figure 6.
Figure 6.
AlphaFold model of tyrosine-protein kinase FRK (UniProt: P42685) with Dasatinib (DrugBank: DB01254). (A) Structure of the whole AF model of tyrosine-protein kinase FRK. Following assigned ECOD domains are shown in different colors: SH3 H-group—blue, SH2—yellow, protein kinase/SAICAR synthase/ATP-grasp—red. Dasatinib is colored by elements (C atoms are shown in green). (B) Structure of kinase domain of tyrosine-protein kinase FRK AF model colored by rainbow. Dasatinib is colored by elements (C atoms are shown in magenta). (C) Surface of the kinase domain of tyrosine-protein kinase FRK AF model colored by rainbow.

Similar articles

Cited by

References

    1. Mistry J., Chuguransky S., Williams L., Qureshi M., Salazar G.A., Sonnhammer E.L.L., Tosatto S.C.E., Paladin L., Raj S., Richardson L.J.et al. .. Pfam: the protein families database in 2021. Nucleic Acids Res. 2021; 49:D412–D419. - PMC - PubMed
    1. Mi H., Lazareva-Ulitsky B., Loo R., Kejariwal A., Vandergriff J., Rabkin S., Guo N., Muruganujan A., Doremieux O., Campbell M.J.et al. .. The PANTHER database of protein families, subfamilies, functions and pathways. Nucleic Acids Res. 2005; 33:D284–D288. - PMC - PubMed
    1. Letunic I., Doerks T., Bork P.. SMART: recent updates, new developments and status in 2015. Nucleic Acids Res. 2015; 43:D257–D260. - PMC - PubMed
    1. Wang J., Chitsaz F., Derbyshire M.K., Gonzales N.R., Gwadz M., Lu S., Marchler G.H., Song J.S., Thanki N., Yamashita R.A.et al. .. The conserved domain database in 2023. Nucleic Acids Res. 2023; 51:D384–D388. - PMC - PubMed
    1. Murzin A.G., Brenner S.E., Hubbard T., Chothia C.. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 1995; 247:536–540. - PubMed