Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jan;43(Database issue):D227-33.
doi: 10.1093/nar/gku1041. Epub 2014 Nov 20.

The SUPERFAMILY 1.75 database in 2014: a doubling of data

Affiliations

The SUPERFAMILY 1.75 database in 2014: a doubling of data

Matt E Oates et al. Nucleic Acids Res. 2015 Jan.

Abstract

We present updates to the SUPERFAMILY 1.75 (http://supfam.org) online resource and protein sequence collection. The hidden Markov model library that provides sequence homology to SCOP structural domains remains unchanged at version 1.75. In the last 4 years SUPERFAMILY has more than doubled its holding of curated complete proteomes over all cellular life, from 1400 proteomes reported previously in 2010 up to 3258 at present. Outside of the main sequence collection, SUPERFAMILY continues to provide domain annotation for sequences provided by other resources such as: UniProt, Ensembl, PDB, much of JGI Phytozome and selected subcollections of NCBI RefSeq. Despite this growth in data volume, SUPERFAMILY now provides users with an expanded and daily updated phylogenetic tree of life (sTOL). This tree is built with genomic-scale domain annotation data as before, but constantly updated when new species are introduced to the sequence library. Our Gene Ontology and other functional and phenotypic annotations previously reported have stood up to critical assessment by the function prediction community. We have now introduced these data in an integrated manner online at the level of an individual sequence, and--in the case of whole genomes--with enrichment analysis against a taxonomically defined background.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Summary of all genome updates and additions at the level of taxonomic Class since the release of SUPERFAMILY 1.75. Eukarya in red, Archaea in green and Bacteria in blue. The size of each pie chart is log scaled based on the number of proteomes within each Class. Light colouration is the proportion of genomes that have been added to the database within a Class, and the dark colouration represents updated genomes. The grey colouring seen in Eukarya represents the relatively few genomes to not have been altered since the release of 1.75.
Figure 2.
Figure 2.
This Venn diagram demonstrates the extent to which the sequence space of the SUPERFAMILY proteome collection is not covered by the PDB and UniProt. Each value in the diagram describes the number of distinct (collapsed to 100% sequence identity) amino acid sequences in each sequence collection.
Figure 3.
Figure 3.
How to create your own phylogenetic trees that are built daily against the most recent updates to the SUPERFAMILY sequence collection. In this example the family Hominidae has been selected from the table and links to phylogenetic resources provided by the sTOL method given at the top of the page. A user may also select individual species of interest and create trees annotated by domain inclusion directly from domain summary pages.
Figure 4.
Figure 4.
In this figure we demonstrate viewing the ancestral domain content for the last common ancestor to all Metazoa, linked from the summary of domain assignments for Homo sapiens. From the main SUPERFAMILY assignments page for a proteome (accessible from the Taxonomy page under Browse on the side menu) a user can view reconstructed ancestral states for any common ancestor as long as the clade has sufficient whole proteome data.

References

    1. Gough J., Karplus K., Hughey R., Chothia C. Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. J. Mol. Biol. 2001;313:903–919. - PubMed
    1. Murzin A., Brenner S., Hubbard T., Chothia C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 1995;247:536–540. - PubMed
    1. Gough J. Genomic scale sub-family assignment of protein domains. Nucleic Acids Res. 2006;34:3625–3633. - PMC - PubMed
    1. Madera M., Vogel C., Kummerfeld S.K., Chothia C., Gough J. The SUPERFAMILY database in 2004: additions and improvements. Nucleic Acids Res. 2004;32:D235–D239. - PMC - PubMed
    1. Pethica R.B., Levitt M., Gough J. Evolutionarily consistent families in SCOP: sequence, structure and function. BMC Struct. Biol. 2012;12:27. - PMC - PubMed

Publication types