Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2004 Jun;135(2):745-55.
doi: 10.1104/pp.104.040071. Epub 2004 Jun 1.

Functional annotation of the Arabidopsis genome using controlled vocabularies

Affiliations

Functional annotation of the Arabidopsis genome using controlled vocabularies

Tanya Z Berardini et al. Plant Physiol. 2004 Jun.

Abstract

Controlled vocabularies are increasingly used by databases to describe genes and gene products because they facilitate identification of similar genes within an organism or among different organisms. One of The Arabidopsis Information Resource's goals is to associate all Arabidopsis genes with terms developed by the Gene Ontology Consortium that describe the molecular function, biological process, and subcellular location of a gene product. We have also developed terms describing Arabidopsis anatomy and developmental stages and use these to annotate published gene expression data. As of March 2004, we used computational and manual annotation methods to make 85,666 annotations representing 26,624 unique loci. We focus on associating genes to controlled vocabulary terms based on experimental data from the literature and use The Arabidopsis Information Resource-developed PubSearch software to facilitate this process. Each annotation is tagged with a combination of evidence codes, evidence descriptions, and references that provide a robust means to assess data quality. Annotation of all Arabidopsis genes will allow quantitative comparisons between sets of genes derived from sources such as microarray experiments. The Arabidopsis annotation data will also facilitate annotation of newly sequenced plant genomes by using sequence similarity to transfer annotations to homologous genes. In addition, complete and up-to-date annotations will make unknown genes easy to identify and target for experimentation. Here, we describe the process of Arabidopsis functional annotation using a variety of data sources and illustrate several ways in which this information can be accessed and used to infer knowledge about Arabidopsis and other plant species.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Visualizing controlled vocabularies and DAGs. TAIR's Keyword Browser (http://www.arabidopsis.org/servlets/Search?action=new_search&type=keyword) allows users to navigate through the parent-child relationships of the ontologies, look up definitions, and view associated data. Hyperlinks are underlined, and clicking on them will open data pages that list the associated information in greater detail. Section A offers an option to view various data type associated with the term. Section B provides the term name, its identification, and an explicit definition of the term. Section C is a legend for interpreting the icons within the tree structure. Section D allows one to browse any listed ontology other than the one being viewed. Section E illustrates the multiple parentage concept in a DAG using the biological process term germination. In this example, germination is an instance of three different parent terms: cell differentiation, post-embryonic development, and physiological process.
Figure 2.
Figure 2.
Functional classification of the whole Arabidopsis genome representing the distribution of genes based on their annotations to terms in the GO cellular component (a), GO molecular function (b), and GO biological process vocabularies (c).
Figure 3.
Figure 3.
Display of controlled vocabulary association on the TAIR Gene detail page (a), which summarizes information relevant to gene, the Term Annotation detail page (b), which displays all annotations made to the term in question, and the Gene Annotation detail page (c), which displays all controlled vocabulary annotations made to that gene. These pages are interlinked so that one can get from one page to the next by clicking on the appropriate hyperlink.
Figure 4.
Figure 4.
Searching with controlled vocabulary terms within one species and across multiple species. a, Screenshot from a TAIR Web page showing a partial list of all Arabidopsis genes associated to the GO term NADH dehydrogenase activity. This page can be retrieved by entering the GO term on the TAIR gene search page (http://www.arabidopsis.org/servlets/Search?action=new_search&type=gene). b, Screenshot from a GO Web page showing a partial list of genes from multiple organisms associated to the term NADH dehydrogenase activity. This page can be reached by entering the GO term on the GO database/ontology browser (http://www.godatabase.org/cgi-bin/go.cgi) or by clicking on the GO database hyperlink from the TAIR keyword detail page.

References

    1. Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408: 796–815 - PubMed
    1. Camon E, Magrane M, Barrell D, Binns D, Fleischmann W, Kersey P, Mulder N, Oinn T, Maslen J, Cox A, Apweiler R (2003) The Gene Ontology Annotation (GOA) project: implementation of GO in SWISS-PROT, TrEMBL, and InterPro. Genome Res 13: 662–672 - PMC - PubMed
    1. Consortium GO (2001) Creating the gene ontology resource: design and implementation. Genome Res 11: 1425–1433 - PMC - PubMed
    1. Dwight SS, Harris MA, Dolinski K, Ball CA, Binkley G, Christie KR, Fisk DG, Issel-Tarver L, Schroeder M, Sherlock G, et al (2002) Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO). Nucleic Acids Res 30: 69–72 - PMC - PubMed
    1. Emanuelsson O, Nielsen H, Brunak S, Svon Heijne G (2000) Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J Mol Biol 300: 1005–1016 - PubMed

Publication types

MeSH terms