Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Mar 18;3(1):4.
doi: 10.1186/2047-217X-3-4.

Understanding how and why the Gene Ontology and its annotations evolve: the GO within UniProt

Affiliations

Understanding how and why the Gene Ontology and its annotations evolve: the GO within UniProt

Rachael P Huntley et al. Gigascience. .

Abstract

The Gene Ontology Consortium (GOC) is a major bioinformatics project that provides structured controlled vocabularies to classify gene product function and location. GOC members create annotations to gene products using the Gene Ontology (GO) vocabularies, thus providing an extensive, publicly available resource. The GO and its annotations to gene products are now an integral part of functional analysis, and statistical tests using GO data are becoming routine for researchers to include when publishing functional information. While many helpful articles about the GOC are available, there are certain updates to the ontology and annotation sets that sometimes go unobserved. Here we describe some of the ways in which GO can change that should be carefully considered by all users of GO as they may have a significant impact on the resulting gene product annotations, and therefore the functional description of the gene product, or the interpretation of analyses performed on GO datasets. GO annotations for gene products change for many reasons, and while these changes generally improve the accuracy of the representation of the underlying biology, they do not necessarily imply that previous annotations were incorrect. We additionally describe the quality assurance mechanisms we employ to improve the accuracy of annotations, which necessarily changes the composition of the annotation sets we provide. We use the Universal Protein Resource (UniProt) for illustrative purposes of how the GO Consortium, as a whole, manages these changes.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Changes to the “apoptotic process” term. The most recent changes to the GO term “apoptotic process” as displayed in QuickGO [20]. In total there have been 54 changes over the lifetime of the term.
Figure 2
Figure 2
Taxon restrictions for the term “flower development”. This term has four taxon restrictions, three of which are inherited from a parent term. These restrictions can prevent GO terms from being used inappropriately for certain taxonomic groups.
Figure 3
Figure 3
Inheritance of taxon restrictions. Less specific, parent terms have fewer taxon restrictions than more specific child terms that are further down the hierarchy. This should be considered when choosing GO terms to use in automatic prediction methods. In the example shown, predicting the term “fatty acid beta-oxidation multienzyme complex” for a set of multispecies proteins may result in more accurate annotation than predicting the term “mitochondrial fatty acid beta-oxidation multienzyme complex”.
Figure 4
Figure 4
Post-processing of automatic annotations. UniProt have rules in place such that if the taxon restrictions are violated in automatic annotations, the annotation can be either deleted (row 1) or edited to use a more appropriate GO term (row 2). In row 1, an Entamoeba protein is annotated to “peroxisome”, these organelles are only present in cellular organisms therefore the annotation is deleted. In row 2, a viral protein is annotated to “cytoplasm”, for viruses the correct GO term to use is “host cell cytoplasm” therefore the GO term is substituted and a GO reference describing this editing process is supplied with the annotation.

References

    1. The Gene Ontology Consortium. Gene ontology: tool for the unification of biology. Nat Genet. 2000;25:25–29. doi: 10.1038/75556. - DOI - PMC - PubMed
    1. Skunca N, Altenhoff A, Dessimoz C. Quality of computationally inferred gene ontology annotations. PLoS Comput Biol. 2012;8:e1002533. doi: 10.1371/journal.pcbi.1002533. - DOI - PMC - PubMed
    1. Blake JA, Dolan M, Drabkin H, Hill DP, Li N, Sitnikov D, Bridges S, Burgess S, Buza T, McCarthy F, Peddinti D, Pillai L, Carbon S, Dietze H, Ireland A, Lewis SE, Mungall CJ, Gaudet P, Chrisholm RL, Fey P, Kibbe WA, Basu S, Siegele DA, McIntosh BK, Renfro DP, Zweifel AE, Hu JC, Brown NH, Tweedie S, Alam-Faruque Y. et al.Gene Ontology annotations and resources. Nucleic Acids Res. 2013;41(Database issue):D530–D535. - PMC - PubMed
    1. Balakrishnan R, Harris MA, Huntley R, Van Auken K, Cherry JM. A guide to best practices for Gene Ontology (GO) manual annotation. Database. 2013;2013:bat054. - PMC - PubMed
    1. Dimmer EC, Huntley RP, Alam-Faruque Y, Sawford T, O’Donovan C, Martin MJ, Bely B, Browne P, Mun Chan W, Eberhardt R, Gardner M, Laiho K, Legge D, Magrane M, Pichler K, Poggioli D, Sehra H, Auchincloss A, Axelsen K, Blatter M-C, Boutet E, Braconi-Quintaje S, Breuza L, Bridge A, Coudert E, Estreicher A, Famiglietti L, Ferro-Rojas S, Feuermann M, Gos A. et al.The UniProt-GO annotation database in 2011. Nucleic Acids Res. 2011;40(Database issue):D565–D570. - PMC - PubMed