Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Aug 1;32(15):2264-71.
doi: 10.1093/bioinformatics/btw114. Epub 2016 Mar 7.

UniProt-DAAC: domain architecture alignment and classification, a new method for automatic functional annotation in UniProtKB

Affiliations

UniProt-DAAC: domain architecture alignment and classification, a new method for automatic functional annotation in UniProtKB

Tunca Doğan et al. Bioinformatics. .

Abstract

Motivation: Similarity-based methods have been widely used in order to infer the properties of genes and gene products containing little or no experimental annotation. New approaches that overcome the limitations of methods that rely solely upon sequence similarity are attracting increased attention. One of these novel approaches is to use the organization of the structural domains in proteins.

Results: We propose a method for the automatic annotation of protein sequences in the UniProt Knowledgebase (UniProtKB) by comparing their domain architectures, classifying proteins based on the similarities and propagating functional annotation. The performance of this method was measured through a cross-validation analysis using the Gene Ontology (GO) annotation of a sub-set of UniProtKB/Swiss-Prot. The results demonstrate the effectiveness of this approach in detecting functional similarity with an average F-score: 0.85. We applied the method on nearly 55.3 million uncharacterized proteins in UniProtKB/TrEMBL resulted in 44 818 178 GO term predictions for 12 172 114 proteins. 22% of these predictions were for 2 812 016 previously non-annotated protein entries indicating the significance of the value added by this approach.

Availability and implementation: The results of the method are available at: ftp://ftp.ebi.ac.uk/pub/contrib/martin/DAAC/ CONTACT: tdogan@ebi.ac.uk

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
(A) Schematic representation of the method; (B) Representation of pairwise DA alignment between two proteins; (C) GO MF DAG; nodes: all terms (blue), predicted terms (red)
Fig. 2.
Fig. 2.
Cross-validation results: (A) ROC and precision versus recall curves for a GO term class; (B) Performance of the method as F-score and (C) as Precision (Color version of this figure is available at Bioinformatics online.)
Fig. 3.
Fig. 3.
Number of domains per protein versus performance in cross-validation graph (Color version of this figure is available at Bioinformatics online.)

References

    1. Altschul S.F. et al. (1990) Basic local alignment search tool. J. Mol. Biol., 215, 403–410. - PubMed
    1. Bailey T.L. et al. (2009) MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res., 37 (Suppl. 2), W202–W208. - PMC - PubMed
    1. Bashton M., Chothia C. (2007) The generation of new protein functions by the combination of domains. Structure, 15, 85–99 - PubMed
    1. Benson D. et al. (2008) GenBank. Nucleic Acids Res., 36 (Suppl. 1), D25–D30. - PMC - PubMed
    1. Björklund ÅK. et al. (2005) Domain rearrangements in protein evolution. J. Mol. Biol., 353, 911–923. - PubMed