Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Nov 17:2012:bas038.
doi: 10.1093/database/bas038. Print 2012.

Developing a biocuration workflow for AgBase, a non-model organism database

Affiliations

Developing a biocuration workflow for AgBase, a non-model organism database

Lakshmi Pillai et al. Database (Oxford). .

Abstract

AgBase provides annotation for agricultural gene products using the Gene Ontology (GO) and Plant Ontology, as appropriate. Unlike model organism species, agricultural species have a body of literature that does not just focus on gene function; to improve efficiency, we use text mining to identify literature for curation. The first component of our annotation interface is the gene prioritization interface that ranks gene products for annotation. Biocurators select the top-ranked gene and mark annotation for these genes as 'in progress' or 'completed'; links enable biocurators to move directly to our biocuration interface (BI). Our BI includes all current GO annotation for gene products and is the main interface to add/modify AgBase curation data. The BI also displays Extracting Genic Information from Text (eGIFT) results for each gene product. eGIFT is a web-based, text-mining tool that associates ranked, informative terms (iTerms) and the articles and sentences containing them, with genes. Moreover, iTerms are linked to GO terms, where they match either a GO term name or a synonym. This enables AgBase biocurators to rapidly identify literature for further curation based on possible GO terms. Because most agricultural species do not have standardized literature, eGIFT searches all gene names and synonyms to associate articles with genes. As many of the gene names can be ambiguous, eGIFT applies a disambiguation step to remove matches that do not correspond to this gene, and filtering is applied to remove abstracts that mention a gene in passing. The BI is linked to our Journal Database (JDB) where corresponding journal citations are stored. Just as importantly, biocurators also add to the JDB citations that have no GO annotation. The AgBase BI also supports bulk annotation upload to facilitate our Inferred from electronic annotation of agricultural gene products. All annotations must pass standard GO Consortium quality checking before release in AgBase. Database URL: http://www.agbase.msstate.edu/.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The AgBase biocuration pipeline. The AgBase biocuration pipeline draws from GO Consortium gene association files and from PubMed data, and the output is the AgBase public gene association files. Briefly, genes to be annotated are prioritized as a ranked list in the GP interface, which are linked to records in the main BI. The eGIFT tool enhances the ability for biocurators to identify and curate appropriate literature while the JDB records reviewed literature. Biocuration must pass standard GO Consortium error and quality checks before public release.
Figure 2
Figure 2
The GP Interface. The GP Interface is used to direct biocurator’s annotation to genes that the community see as requiring annotation. Genes are prioritized separately for each species (A), and each species has a searchable, ranked list where genes are ranked based on requests for annotation and presence on commonly used array. Each gene is linked its gene products in the biocuration interface (B) so that the biocurator can move seamlessly to annotation.
Figure 3
Figure 3
Linking eGIFT to the AgBase BI. A summary of eGIFT GO terms and links to corresponding literature is displayed in the top right hand corner of each gene product page in the AgBase BI. This table allows biocurators to rapidly identify potential new GO Terms and link out to relevant literature.
Figure 4
Figure 4
Current AgBase JDB statistics (as of July 2012). AgBase biocurators record the articles they look at for biocuration in JDB and classify them as annotated (contain information they annotate to the GO), no data (contain no GO data) or unavailable (likely to have GO data but unable to obtain full article for curation).
Figure 5
Figure 5
The AgBase annotation workflow is supported by three underlying databases. This schema shows the GP, BI and JDB (cylinders), the information they contribute to interface forms (squares) and the data from each database used to create these interfaces (arrows).

Similar articles

Cited by

References

    1. McCarthy FM, Gresham CR, Buza TJ, et al. AgBase: supporting functional modeling in agricultural organisms. Nucleic Acids Res. 2011;39:D497–D506. - PMC - PubMed
    1. Jaiswal P, Avraham S, Ilic K, et al. Plant Ontology (PO): a controlled vocabulary of plant structures and growth stages. Comp. Funct. Genomics. 2005;6:388–397. - PMC - PubMed
    1. Tudor C, Schmidt C, Vijay-Shanker K. eGIFT: mining gene information from the literature. BMC Bioinformatics. 2010;11:418. - PMC - PubMed
    1. Morrey C, Perl Y, Halper M, et al. A chemical specialty semantic network for the Unified Medical Language System. J. Cheminform. 2012;4:9. - PMC - PubMed
    1. Shah P, Perez-Iratxeta C, Bork P, et al. Information extraction from full text scientific articles: where are the keywords? BMC Bioinformatics. 2003;4:20. - PMC - PubMed

Publication types