Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Dec 1;22(23):2934-9.
doi: 10.1093/bioinformatics/btl372. Epub 2006 Oct 26.

Babel's tower revisited: a universal resource for cross-referencing across annotation databases

Affiliations

Babel's tower revisited: a universal resource for cross-referencing across annotation databases

Sorin Drăghici et al. Bioinformatics. .

Abstract

Motivation: Annotation databases are widely used as public repositories of biological knowledge. However, most of these resources have been developed by independent groups which used different designs and different identifiers for the same biological entities. As we show in this article, incoherent name spaces between various databases represent a serious impediment to using the existing annotations at their full potential. Navigating between various such name spaces by mapping IDs from one database to another is a very important issue which is not properly addressed at the moment.

Results: We have developed a web-based resource, Onto-Translate (OT), which effectively addresses this problem. OT is able to map onto each other different types of biological entities from the following annotation databases: Swiss-Prot, TrEMBL, NREF, PIR, Gene Ontology, KEGG, Entrez Gene, GenBank, GenPept, IMAGE, RefSeq, UniGene, OMIM, PDB, Eukaryotic Promoter Database, HUGO Gene Nomenclature Committee and NetAffx. Currently, OT is able to perform 462 types of mappings between 29 different types of IDs from 17 databases concerning 53 organisms. Among these, over 300 types of translations and 15 types of IDs are not currently supported by any other tool or resource. On average, OT is able to correctly map between 96 and 99% of the biological entities provided as input. In terms of speed, sets of approximately 20 000 IDs can be translated in <30 s, in most cases.

Availability: OT is a part of Onto-Tools, which is freely available at http://vortex.cs.wayne.edu/Projects.html

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Onto-Translate relational database schema. This schema contains an entity for each of the source databases used by OT. The shapes represent the type of the given biological entity. A relationship between two databases is represented by a line connecting the two entities. The type of relationship between two entities is indicated by labels on the corresponding line. For instance, the relationship between Entrez Gene and Gene Ontology is many-to-many. In other words, a gene may be annotated using zero or more GO terms and a GO term may be used to annotate zero or more genes.
Fig. 2
Fig. 2
A comparison of the scopes of Onto-Translate, RESOURCERER, MatchMiner, SOURCE, and GeneMerge. in terms of possible mappings between various types of IDs.
Fig. 3
Fig. 3
A comparison of the accuracy of Onto-Translate, MatchMiner and SOURCE. The input file included 19,248 gene symbols (19,562 Entrez Gene IDs) for human, and 12,991 gene symbols (13,023 Entrez Gene IDs) for mouse, from the respective Affymetrix arrays. The graph shows the percentages of the input genes successfully translated in each case.
Fig. 4
Fig. 4
Scaling properties of Onto-Translate (OT), MatchMiner (MM) and SOURCE. The graph shows the time (in sec) necessary to translate various sets containing between 10 and 19,119 distinct genes from Affymetrix 133 Plus 2.0. At fewer than 1,000 genes, the 3 resources have very comparable query times of under 10 seconds. When larger sets are involved, there is a substantial performance difference.

References

    1. Ashburner M, et al. Gene ontology: Tool for the unification of biology. Nature Genetics. 2000;25:25–29. - PMC - PubMed
    1. Ashburner M, et al. Creating the gene ontology resource: Design and implementation. Genome Research. 2001;11:1425–1433. - PMC - PubMed
    1. Bairoch A, Apweiler R, Wu CH, Barker WC, Boeckmann SFB, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Natale DA, O'Donovan C, Redaschi N, Yeh L-SL. The universal protein resource (uniprot). Nucleic Acids Research. 2005;33:D154–D159. - PMC - PubMed
    1. Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler D. Genbank. Nucleic Acids Research. 2005;33:D34–D38. - PMC - PubMed
    1. Berman HM, Westbrook J, Feng Z, Gilliland1 G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. The protein data bank. Nucleic Acids Research. 2000;28(1):235–242. - PMC - PubMed

Publication types

MeSH terms