Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Oct 18:12:27.
doi: 10.1186/1472-6807-12-27.

Evolutionarily consistent families in SCOP: sequence, structure and function

Affiliations

Evolutionarily consistent families in SCOP: sequence, structure and function

Ralph B Pethica et al. BMC Struct Biol. .

Abstract

Background: SCOP is a hierarchical domain classification system for proteins of known structure. The superfamily level has a clear definition: Protein domains belong to the same superfamily if there is structural, functional and sequence evidence for a common evolutionary ancestor. Superfamilies are sub-classified into families, however, there is not such a clear basis for the family level groupings. Do SCOP families group together domains with sequence similarity, do they group domains with similar structure or by common function? It is these questions we answer, but most importantly, whether each family represents a distinct phylogenetic group within a superfamily.

Results: Several phylogenetic trees were generated for each superfamily: one derived from a multiple sequence alignment, one based on structural distances, and the final two from presence/absence of GO terms or EC numbers assigned to domains. The topologies of the resulting trees and confidence values were compared to the SCOP family classification.

Conclusions: We show that SCOP family groupings are evolutionarily consistent to a very high degree with respect to classical sequence phylogenetics. The trees built from (automatically generated) structural distances correlate well, but are not always consistent with SCOP (hand annotated) groupings. Trees derived from functional data are less consistent with the family level than those from structure or sequence, though the majority still agree. Much of GO and EC annotation applies directly to one family or subset of the family; relatively few terms apply at the superfamily level. Maximum sequence diversity within a family is on average 22% but close to zero for superfamilies.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The number of superfamily agreements/disagreements with SCOP for varying confidence values. A ROC curve showing the number of superfamilies containing agreements against the number containing disagreements of trees with SCOP's groupings, for confidence values decreasing from left to right. For sequence trees, confidence is based on the bootstrap value assigned to an edge. Structures are ranked using the total structural distance, and function is ranked by the total number of GO terms or EC numbers which support an edge.
Figure 2
Figure 2
Examples of disagreements with SCOP. Examples of SCOP superfamilies which contain a disagreement found with trees based on sequence information, supported by high confidence values. Four of the common reasons for disagreement are explained. Images produced with TreeVector [14].
Figure 3
Figure 3
Sequence divergence in families and superfamilies. Graph shows the maximum sequence diversity between two members of the same superfamily (or family) in SCOP. Domains which continue to diverge beyond detectable sequence identity have their distribution collapsed to the far left side of the graph; the large number with zero percent sequence identity represent those cases in which BLAST was unable to find alignment.
Figure 4
Figure 4
Structural divergence in families and superfamilies. Graph shows the maximum structural diversity between two members of the same superfamily (or family) in SCOP. Structural distances used are the scores produce by Structal for the alignment of two domains.
Figure 5
Figure 5
Level in SCOP of all single domain proteins associated with a specific GO term. Figure shows the level in SCOP at which all single domains associated with a particular GO term are found. I.e. if the group represents a family or superfamily. These are also broken down into the three main ontologies of GO terms.
Figure 6
Figure 6
An overview of the algorithm used to determine agreements/disagreements of trees with SCOP's groupings. Figure shows part of a tree built from domain sequences in a SCOP superfamily, and illustrates the algorithm involved in establishing if the tree agrees or disagrees with SCOP's family level grouping.

References

    1. Murzin AG, Brenner SE, Hubbard T, Chothia C. SCOP: A structural classification of proteins database for the investigation of sequences and structures. J Mol Biol. 1995;247:536–540. - PubMed
    1. Andreeva A, Howorth D, Brenner SE, Hubbard TJ, Chothia C, Murzin AG. SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Res. 2004;32:226–229. doi: 10.1093/nar/gkh039. - DOI - PMC - PubMed
    1. Andreeva A, Howorth D, Chandonia JM, Brenner SE, Hubbard TJP, Chothia C, Murzin AG. Data growth and its impact on the SCOP database: new developments. Nucleic Acid Res. 2008;36:419–425. - PMC - PubMed
    1. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN. et al.The Protein Data Bank. Nucleic Acids Res. 2000;28:235–242. doi: 10.1093/nar/28.1.235. - DOI - PMC - PubMed
    1. Gough J, Chothia C. SUPERFAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignments. Nucleic Acids Res. 2002;30:268–272. doi: 10.1093/nar/30.1.268. - DOI - PMC - PubMed

Publication types

LinkOut - more resources