Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation

The Gene Ontology's Reference Genome Project: a unified framework for functional annotation across species

Reference Genome Group of the Gene Ontology Consortium. PLoS Comput Biol. 2009 Jul.

Abstract

The Gene Ontology (GO) is a collaborative effort that provides structured vocabularies for annotating the molecular function, biological role, and cellular location of gene products in a highly systematic way and in a species-neutral manner with the aim of unifying the representation of gene function across different organisms. Each contributing member of the GO Consortium independently associates GO terms to gene products from the organism(s) they are annotating. Here we introduce the Reference Genome project, which brings together those independent efforts into a unified framework based on the evolutionary relationships between genes in these different organisms. The Reference Genome project has two primary goals: to increase the depth and breadth of annotations for genes in each of the organisms in the project, and to create data sets and tools that enable other genome annotation efforts to infer GO annotations for homologous genes in their organisms. In addition, the project has several important incidental benefits, such as increasing annotation consistency across genome databases, and providing important improvements to the GO's logical structure and biological content.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Distribution of the PANTHER families with respect to the number of reference genome species having representatives in each family.
Figure 2
Figure 2. Tree representation of the TOP2 homolog set for the twelve species from the Reference Genome project.
Genes having experimental data are labeled in red. Since members of all represented branches have “GO:0003918 DNA topoisomerase (ATP-hydrolyzing) activity” and a role in “GO:0007059 chromosome segregation”, the common ancestor (CA) can be inferred to also have had these functions. We thus predict that all descendents can be annotated to those terms with reasonable confidence. The sequences represented are (from top to bottom): A. thaliana TAIR:locus = 2075765, E. coli UniProt: P0AFI2 (parC), E. coli UniProt: P0AES4 (gyrA), E. coli UniProt: P20083 (parE), E. coli UniProt: P0AES6 (gyrB), A. thaliana TAIR:locus = 2146658, A. thaliana TAIR:locus = 2076268, A. thaliana TAIR:locus = 2146698, A. thaliana TAIR:locus = 2076201, D. discoideum dictyBase: DDB_G0279737 (top2mt), D. discoideum dictyBase: DDB_ G0270418 (top2), S. cerevisiae SGD:S000005032 (TOP2), S. pombe GeneDB SPBC1A4.03c (top2), D. melanogaster FlyBase FBgn0003732 (top2), C. elegans WormBase WBGene00019876 (R05D3.1), C. elegans WormBase WBGene00022854 (cin-4), C. elegans WormBase WBGene00021604 (Y46H3C.4), D. reiro ZFIN ZDB-GENE-030131-2453 (top2A), D. reiro ZFIN ZDB-GENE-041008-136 (top2B), G. gallus UniProt:O42130 (top2A), H. sapiens UniProt:P11288 (top2A), M. musculus MGI:98790 (top2A), R. norvegius RGD: 62048 (top2A), G. gallus UniProt: O42131 (top2B), H. sapiens UniProt:P02880 (top2B), M. musculus MGI:98791 (top2B), R. norvegius RGD: 1586156 (top2B).
Figure 3
Figure 3. The Gene Ontology's brower AmiGO displays Comparison Graph for genes presents in homolosets.
Those show all annotations, both experimental (evidence codes: IDA, IMP, IGI, IPI, IEP) as well as those inferred from sequence similarity to an experimentally characterized gene (ISS) and by curators (IC). Direct annotations to a GO term are indicated by colored wedges. Different species are represented by different colors. What species to display can be selected from the Control Panel on the righ hand side (here, the species selected are H. sapiens, D. reiro, and E. coli). The wedges also contain a small color-coded circle that indicates whether the annotation to a term is based on experimental data (green), supported by sequence similarity (blue), or is annotated with other evidence (no circle in the wedge). Mousing over a term leads to the display of the term ID, term name, and a complete list of annotations to that term by species. Here we show the term “chromosome segreagation”, for which five of the twelve species have experimental data to support that annotation. Annotations based on experimental data are indicated by “E”, and those based on sequence similarity by an “I”.

References

    1. Bourne PE, McEntyre J. Biocurators: contributors to the world of science. PLoS Comput Biol. 2006;2:e142. doi:10.1371/journal.pcbi.0020142. - PMC - PubMed
    1. Howe D, Costanzo M, Fey P, Gojobori T, Hannick L, et al. Big data: the future of biocuration. Nature. 2008;455:47–50. - PMC - PubMed
    1. The Gene Ontology Consortium. Gene ontology: tool for the unification of biology. Nat Genet. 2000;25:25–29. - PMC - PubMed
    1. The Gene Ontology Consortium. The Gene Ontology project in 2008. Nucleic Acids Res. 2008;36:D440–D444. - PMC - PubMed
    1. Rhee SY, Wood V, Dolinski K, Draghici S. Use and misuse of the gene ontology annotations. Nat Rev Genet. 2008;9:509–515. - PubMed

Publication types

LinkOut - more resources