Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Aug 1:2012:bas030.
doi: 10.1093/database/bas030. Print 2012.

Assessment of community-submitted ontology annotations from a novel database-journal partnership

Affiliations

Assessment of community-submitted ontology annotations from a novel database-journal partnership

Tanya Z Berardini et al. Database (Oxford). .

Abstract

As the scientific literature grows, leading to an increasing volume of published experimental data, so does the need to access and analyze this data using computational tools. The most commonly used method to convert published experimental data on gene function into controlled vocabulary annotations relies on a professional curator, employed by a model organism database or a more general resource such as UniProt, to read published articles and compose annotation statements based on the articles' contents. A more cost-effective and scalable approach capable of capturing gene function data across the whole range of biological research organisms in computable form is urgently needed. We have analyzed a set of ontology annotations generated through collaborations between the Arabidopsis Information Resource and several plant science journals. Analysis of the submissions entered using the online submission tool shows that most community annotations were well supported and the ontology terms chosen were at an appropriate level of specificity. Of the 503 individual annotations that were submitted, 97% were approved and community submissions captured 72% of all possible annotations. This new method for capturing experimental results in a computable form provides a cost-effective way to greatly increase the available body of annotations without sacrificing annotation quality. Database URL: www.arabidopsis.org.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
The TOAST interface. (A) Initial page that requests stable article identifiers and locus identifiers. Users can then add annotations in six different areas, five of which are controlled vocabularies. (B) The subcellular localization data entry form. Submissions are aided by an auto-complete functionality which suggests terms that match the user's entry. Once selected, the appropriate stable id for the ontology term is also captured but not displayed to the submitter. Users can also enter terms not in the suggestion list. (C) Form with data ready for submission. At this stage the user may add additional loci or annotations or complete the submission process by saving to the curation database.
Figure 1.
Figure 1.
The TOAST interface. (A) Initial page that requests stable article identifiers and locus identifiers. Users can then add annotations in six different areas, five of which are controlled vocabularies. (B) The subcellular localization data entry form. Submissions are aided by an auto-complete functionality which suggests terms that match the user's entry. Once selected, the appropriate stable id for the ontology term is also captured but not displayed to the submitter. Users can also enter terms not in the suggestion list. (C) Form with data ready for submission. At this stage the user may add additional loci or annotations or complete the submission process by saving to the curation database.
Figure 2.
Figure 2.
Literature-based annotation at TAIR (2000–2010). The total number of research articles containing Arabidopsis gene-related information in the TAIR database is represented in blue. In green and orange are the number of articles used for controlled vocabulary annotations by either TAIR or the community, respectively.
Figure 3.
Figure 3.
Distribution of community annotation counts. The bins group articles by number of associated community annotations.
Figure 4.
Figure 4.
Analysis of community annotations. (A) Completeness of community annotations. The 50 articles analyzed are shown on X-axis, and the total number of curator and community annotations per paper shown on the Y-axis. The number of community annotations is shown in blue, and the number of added curator annotations in orange. (B) Experimental support for community annotations. Supported community annotations in blue, unsupported community annotations in orange, out of scope annotations in green. (C) Level of specificity of community annotations. Papers shown on X-axis, total number of community annotations per paper shown on Y-axis. Community annotations with same specificity as curator annotations are shown in blue, more specific community annotations in orange, more specific curator annotations in green.
Figure 5.
Figure 5.
TAIR annotation detail page showing attribution to community member.

References

    1. Brady SM, Provart NJ. Web-queryable large-scale data sets for hypothesis generation in plant biology. Plant Cell. 2009;21:1034–1051. - PMC - PubMed
    1. Hwang S, Rhee SY, Marcotte EM, et al. Systematic prediction of gene function in Arabidopsis thaliana using a probabilistic functional gene network. Nat. Protoc. 2011;6:1429–1442. - PMC - PubMed
    1. Quimbaya M, Vandepoele K, Raspé E, et al. Identification of putative cancer genes through data integration and comparative genomics between plants and humans. Cell. Mol. Life Sci. 2012;69:2041–2055. - PMC - PubMed
    1. Ruckle ME, Burgoon LD, Lawrence LA, et al. Plastids are major regulators of light signaling in Arabidopsis. Plant Physiol. 2012;159:366–390. - PMC - PubMed
    1. Stoppel R, Meurer J. The cutting crew – ribonucleases are key players in the control of plastid gene expression. J. Exp. Bot. 2012;63:1663–1673. - PubMed

Publication types