Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 May 10:8:668184.
doi: 10.3389/fmolb.2021.668184. eCollection 2021.

Tracing Evolution Through Protein Structures: Nature Captured in a Few Thousand Folds

Affiliations
Review

Tracing Evolution Through Protein Structures: Nature Captured in a Few Thousand Folds

Nicola Bordin et al. Front Mol Biosci. .

Abstract

This article is dedicated to the memory of Cyrus Chothia, who was a leading light in the world of protein structure evolution. His elegant analyses of protein families and their mechanisms of structural and functional evolution provided important evolutionary and biological insights and firmly established the value of structural perspectives. He was a mentor and supervisor to many other leading scientists who continued his quest to characterise structure and function space. He was also a generous and supportive colleague to those applying different approaches. In this article we review some of his accomplishments and the history of protein structure classifications, particularly SCOP and CATH. We also highlight some of the evolutionary insights these two classifications have brought. Finally, we discuss how the expansion and integration of protein sequence data into these structural families helps reveal the dark matter of function space and can inform the emergence of novel functions in Metazoa. Since we cover 25 years of structural classification, it has not been feasible to review all structure based evolutionary studies and hence we focus mainly on those undertaken by the SCOP and CATH groups and their collaborators.

Keywords: bioinformatics and computational biology; protein evolution; protein structural and functional analysis; protein structure classification; structural bioinformatics.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
Growth of domains, folds and chains deposited in the Protein Data Bank from 1972 onwards. Data sources: PDB, CATH.
FIGURE 2
FIGURE 2
Structural similarity measured by SSAP score (left) or normalised RMSD (right) vs % of sequence identity.
FIGURE 3
FIGURE 3
Overview of the CATH classification scheme for protein domains.
FIGURE 4
FIGURE 4
Highly divergent structural homologues within the HUPs SuperFamily (CATH ID 3.40.50.620). Six diverse structural clusters (also called structurally similar groups, SSGs) are identified using SSAP to compare structures all against all (see tree top left and figures on the right). However, representatives from each SSG can be superposed to reveal the highly conserved structural core common to all (see central black region in the bottom left figure).
FIGURE 5
FIGURE 5
Conservation of the structural core (highlighted in green) within the HUPs superfamily.
FIGURE 6
FIGURE 6
Top 9 “super-folds” in CATH v4.3. The inner wheel shows the proportion of structures that fall into each class, architecture, fold group and superfamily respectively.
FIGURE 7
FIGURE 7
Top 100 most populated CATH SuperFamilies (CATH v4.3) with additional details regarding sequence counts and unique EC and GO terms for the top 10 most populated SuperFamilies.
FIGURE 8
FIGURE 8
Number of MDAs vs Number of sequence subfamilies (FunFams) for each SuperFamily in CATH v4.3.
FIGURE 9
FIGURE 9
Functional diversity (captured by number of functional families - FunFams) vs. sequence diversity (number of Gene3D s90 clusters i.e. in which relatives share 90% or more sequence identity) for CATH superfamilies. Each dot represents an individual superfamily.
FIGURE 10
FIGURE 10
Enzyme Commission terms distributions for each CATH v4.3 SuperFamilies, showing that 65 superfamilies have more than 20 different chemistries (i.e. EC3s).
FIGURE 11
FIGURE 11
Survey of FunFam expansions across the tree of life, darker colors show a higher number of FunFams in that superfamily.

References

    1. Altschul S., Madden T. L., Schäffer A. A., Zhang J., Zhang Z., Miller W., et al. (1997). Gapped BLAST and PSI-BLAST: a New Generation of Protein Database Search Programs. Nucleic Acids Res. 25, 3389–3402. 10.1093/nar/25.17.3389 - DOI - PMC - PubMed
    1. Andreeva A., Howorth D., Chothia C., Kulesha E., Murzin A. G. (2014). SCOP2 Prototype: a New Approach to Protein Structure Mining. Nucl. Acids Res. 42, D310–D314. 10.1093/nar/gkt1242 - DOI - PMC - PubMed
    1. Armstrong D. R., Berrisford J. M., Conroy M. J., Gutmanas A., Anyango S., Choudhary P., et al. (2019). PDBe: Improved Findability of Macromolecular Structure Data in the PDB. Nucleic Acids Res. 48, D335–D343. 10.1093/nar/gkz990 - DOI - PMC - PubMed
    1. Bashton M., Chothia C. (2007). The Generation of New Protein Functions by the Combination of Domains. Structure. 15, 85–99. 10.1016/j.str.2006.11.009 - DOI - PubMed
    1. Björklund Å. K., Ekman D., Light S., Frey-Skött J., Elofsson A. (2005). Domain Rearrangements in Protein Evolution. J. Mol. Biol. 353, 911–923. 10.1016/j.jmb.2005.08.067 - DOI - PubMed

LinkOut - more resources