. 2012 Nov 17:2012:bas038.

doi: 10.1093/database/bas038. Print 2012.

Developing a biocuration workflow for AgBase, a non-model organism database

Lakshmi Pillai¹, Philippe Chouvarine, Catalina O Tudor, Carl J Schmidt, K Vijay-Shanker, Fiona M McCarthy

Affiliations

PMID: 23160411
PMCID: PMC3500517
DOI: 10.1093/database/bas038

Developing a biocuration workflow for AgBase, a non-model organism database

Lakshmi Pillai et al. Database (Oxford). 2012.

. 2012 Nov 17:2012:bas038.

doi: 10.1093/database/bas038. Print 2012.

Authors

Lakshmi Pillai¹, Philippe Chouvarine, Catalina O Tudor, Carl J Schmidt, K Vijay-Shanker, Fiona M McCarthy

Affiliation

¹ Department of Basic Sciences, College of Veterinary Medicine, Mississippi State University, MS 39762, USA.

PMID: 23160411
PMCID: PMC3500517
DOI: 10.1093/database/bas038

Abstract

AgBase provides annotation for agricultural gene products using the Gene Ontology (GO) and Plant Ontology, as appropriate. Unlike model organism species, agricultural species have a body of literature that does not just focus on gene function; to improve efficiency, we use text mining to identify literature for curation. The first component of our annotation interface is the gene prioritization interface that ranks gene products for annotation. Biocurators select the top-ranked gene and mark annotation for these genes as 'in progress' or 'completed'; links enable biocurators to move directly to our biocuration interface (BI). Our BI includes all current GO annotation for gene products and is the main interface to add/modify AgBase curation data. The BI also displays Extracting Genic Information from Text (eGIFT) results for each gene product. eGIFT is a web-based, text-mining tool that associates ranked, informative terms (iTerms) and the articles and sentences containing them, with genes. Moreover, iTerms are linked to GO terms, where they match either a GO term name or a synonym. This enables AgBase biocurators to rapidly identify literature for further curation based on possible GO terms. Because most agricultural species do not have standardized literature, eGIFT searches all gene names and synonyms to associate articles with genes. As many of the gene names can be ambiguous, eGIFT applies a disambiguation step to remove matches that do not correspond to this gene, and filtering is applied to remove abstracts that mention a gene in passing. The BI is linked to our Journal Database (JDB) where corresponding journal citations are stored. Just as importantly, biocurators also add to the JDB citations that have no GO annotation. The AgBase BI also supports bulk annotation upload to facilitate our Inferred from electronic annotation of agricultural gene products. All annotations must pass standard GO Consortium quality checking before release in AgBase. Database URL: http://www.agbase.msstate.edu/.

PubMed Disclaimer

Figures

**Figure 1**
The AgBase biocuration pipeline. The AgBase biocuration pipeline draws from GO Consortium gene association files and from PubMed data, and the output is the AgBase public gene association files. Briefly, genes to be annotated are prioritized as a ranked list in the GP interface, which are linked to records in the main BI. The eGIFT tool enhances the ability for biocurators to identify and curate appropriate literature while the JDB records reviewed literature. Biocuration must pass standard GO Consortium error and quality checks before public release.

**Figure 2**
The GP Interface. The GP Interface is used to direct biocurator’s annotation to genes that the community see as requiring annotation. Genes are prioritized separately for each species (A), and each species has a searchable, ranked list where genes are ranked based on requests for annotation and presence on commonly used array. Each gene is linked its gene products in the biocuration interface (B) so that the biocurator can move seamlessly to annotation.

**Figure 3**
Linking eGIFT to the AgBase BI. A summary of eGIFT GO terms and links to corresponding literature is displayed in the top right hand corner of each gene product page in the AgBase BI. This table allows biocurators to rapidly identify potential new GO Terms and link out to relevant literature.

**Figure 4**
Current AgBase JDB statistics (as of July 2012). AgBase biocurators record the articles they look at for biocuration in JDB and classify them as annotated (contain information they annotate to the GO), no data (contain no GO data) or unavailable (likely to have GO data but unable to obtain full article for curation).

**Figure 5**
The AgBase annotation workflow is supported by three underlying databases. This schema shows the GP, BI and JDB (cylinders), the information they contribute to interface forms (squares) and the data from each database used to create these interfaces (arrows).

See this image and copyright information in PMC

Cited by

Functional and expression analyses of transcripts based on full-length cDNAs of Sorghum bicolor.
Shimada S, Makita Y, Kuriyama-Kondou T, Kawashima M, Mochizuki Y, Hirakawa H, Sato S, Toyoda T, Matsui M. Shimada S, et al. DNA Res. 2015 Dec;22(6):485-93. doi: 10.1093/dnares/dsv030. Epub 2015 Nov 5. DNA Res. 2015. PMID: 26546227 Free PMC article.
Genome Sequencing of the Pyruvate-producing Strain Candida glabrata CCTCC M202019 and Genomic Comparison with Strain CBS138.
Xu N, Ye C, Chen X, Liu J, Liu L, Chen J. Xu N, et al. Sci Rep. 2016 Oct 7;6:34893. doi: 10.1038/srep34893. Sci Rep. 2016. PMID: 27713500 Free PMC article.
Machine learning approaches and databases for prediction of drug-target interaction: a survey paper.
Bagherian M, Sabeti E, Wang K, Sartor MA, Nikolovska-Coleska Z, Najarian K. Bagherian M, et al. Brief Bioinform. 2021 Jan 18;22(1):247-269. doi: 10.1093/bib/bbz157. Brief Bioinform. 2021. PMID: 31950972 Free PMC article. Review.
Xenbase: key features and resources of the Xenopus model organism knowledgebase.
Fisher M, James-Zorn C, Ponferrada V, Bell AJ, Sundararaj N, Segerdell E, Chaturvedi P, Bayyari N, Chu S, Pells T, Lotay V, Agalakov S, Wang DZ, Arshinoff BI, Foley S, Karimi K, Vize PD, Zorn AM. Fisher M, et al. Genetics. 2023 May 4;224(1):iyad018. doi: 10.1093/genetics/iyad018. Genetics. 2023. PMID: 36755307 Free PMC article.
Preliminary evaluation of the CellFinder literature curation pipeline for gene expression in kidney cells and anatomical parts.
Neves M, Damaschun A, Mah N, Lekschas F, Seltmann S, Stachelscheid H, Fontaine JF, Kurtz A, Leser U. Neves M, et al. Database (Oxford). 2013 Apr 18;2013:bat020. doi: 10.1093/database/bat020. Print 2013. Database (Oxford). 2013. PMID: 23599415 Free PMC article.

See all "Cited by" articles

References

1. McCarthy FM, Gresham CR, Buza TJ, et al. AgBase: supporting functional modeling in agricultural organisms. Nucleic Acids Res. 2011;39:D497–D506. - PMC - PubMed
1. Jaiswal P, Avraham S, Ilic K, et al. Plant Ontology (PO): a controlled vocabulary of plant structures and growth stages. Comp. Funct. Genomics. 2005;6:388–397. - PMC - PubMed
1. Tudor C, Schmidt C, Vijay-Shanker K. eGIFT: mining gene information from the literature. BMC Bioinformatics. 2010;11:418. - PMC - PubMed
1. Morrey C, Perl Y, Halper M, et al. A chemical specialty semantic network for the Unified Medical Language System. J. Cheminform. 2012;4:9. - PMC - PubMed
1. Shah P, Perez-Iratxeta C, Bork P, et al. Information extraction from full text scientific articles: where are the keywords? BMC Bioinformatics. 2003;4:20. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Developing a biocuration workflow for AgBase, a non-model organism database

Affiliation

Developing a biocuration workflow for AgBase, a non-model organism database

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials