Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Nov;12(6):723-35.
doi: 10.1093/bib/bbr002. Epub 2011 Feb 17.

The what, where, how and why of gene ontology--a primer for bioinformaticians

Affiliations

The what, where, how and why of gene ontology--a primer for bioinformaticians

Louis du Plessis et al. Brief Bioinform. 2011 Nov.

Abstract

With high-throughput technologies providing vast amounts of data, it has become more important to provide systematic, quality annotations. The Gene Ontology (GO) project is the largest resource for cataloguing gene function. Nonetheless, its use is not yet ubiquitous and is still fraught with pitfalls. In this review, we provide a short primer to the GO for bioinformaticians. We summarize important aspects of the structure of the ontology, describe sources and types of functional annotations, survey measures of GO annotation similarity, review typical uses of GO and discuss other important considerations pertaining to the use of GO in bioinformatics applications.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Increase in the number of experimentally verified GO term assignments available for the respective organism between September 2002 and September 2010. The GO consortium was initially focused on Eukaryotes, a fact reflected in the distribution and increase of annotations available in the GO database. Contrast for instance the steady growth of experimentally verified annotations for A. thaliana, S. cerevisiae or M. musculus with the sharp increment in the number of experimentally verified annotations available for E. coli: from 33 in 2002 to 1852 in 2010.
Figure 2:
Figure 2:
The structure of the GO is illustrated on some of the paths of term GO:0060491 to its root term. Note that it is possible for a term to have multiple parents.
Figure 3:
Figure 3:
GO evidence codes and their abbreviations. Evidence code NR (not recorded) is used for annotations assigned prior to the use of evidence codes, and is not assigned to new annotations.
Figure 4:
Figure 4:
A decision tree for deciding which evidence code to use. Figure adapted from http://www.geneontology.org/GO.evidence.tree.shtml.
Figure 5:
Figure 5:
The distribution of evidence codes among annotations in the GO on 1 April 2010.
Figure 6:
Figure 6:
Estimation of correctness and coverage of computationally inferred GO terms (IEA) from September 2008. The estimation is based on data for four well annotated Eukaryotes: A. thaliana, C. elegans, Drosophila melanogaster and Saccharomyces cerevisiae. Confirmed predictions are those 2008 IEA annotations that were ‘promoted’ to one of experimental evidence codes (EXP, IMP, IGI, IPI, IDA, IEP) in the September 2010 annotation file. Rejected predictions are IEA annotations in 2008 that were subsequently removed. The X-axis is a measure of completeness (‘recall’). It represents the fraction of genes having experimentally validated annotations, added in the 2008–10 period, that were correctly predicted in the 2008 IEA annotations file. The Y-axis is a measure of correctness (‘precision’). It represents the fraction of genes having IEA annotations in 2008, later confirmed by experimentally validated annotations (in the 2008–10 period). The size of each bubble reflects the frequency of the respective GO term in annotations assigned using experimental evidence codes and is a surrogate for the generality of the term: the larger the bubble, more abundantly is the term used in GO experimental annotations. To minimize estimation errors, terms included in the figure have at least five confirmed 2008 IEA annotations and five rejected IEA annotations, resulting in 72 BP terms, 85 MF terms and 37 CC terms. The files containing annotations were downloaded from the GOA database [14].

References

    1. Bodenreider O, Stevens R. Bio-ontologies: current trends and future directions. Brief Bioinform. 2006;7(3):256–74. - PMC - PubMed
    1. IUBMB. Enzyme Nomenclature. 1992. Academic Press, San Diego.
    1. Ashburner M, Ball CA, Blake JA, et al. Gene Ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25(1):25–9. - PMC - PubMed
    1. Smith B, Ashburner M, Rosse C, et al. The OBO foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol. 2007;25(11):1251. - PMC - PubMed
    1. Hu JC, Karp PD, Keseler IM, et al. What we can learn about Escherichia coli through application of Gene Ontology. Trends Microbiol. 2009;17(7):269–78. - PMC - PubMed