Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Mar 9;19(1):94.
doi: 10.1186/s12859-018-2103-8.

Textpresso Central: a customizable platform for searching, text mining, viewing, and curating biomedical literature

Affiliations

Textpresso Central: a customizable platform for searching, text mining, viewing, and curating biomedical literature

H-M Müller et al. BMC Bioinformatics. .

Abstract

Background: The biomedical literature continues to grow at a rapid pace, making the challenge of knowledge retrieval and extraction ever greater. Tools that provide a means to search and mine the full text of literature thus represent an important way by which the efficiency of these processes can be improved.

Results: We describe the next generation of the Textpresso information retrieval system, Textpresso Central (TPC). TPC builds on the strengths of the original system by expanding the full text corpus to include the PubMed Central Open Access Subset (PMC OA), as well as the WormBase C. elegans bibliography. In addition, TPC allows users to create a customized corpus by uploading and processing documents of their choosing. TPC is UIMA compliant, to facilitate compatibility with external processing modules, and takes advantage of Lucene indexing and search technology for efficient handling of millions of full text documents. Like Textpresso, TPC searches can be performed using keywords and/or categories (semantically related groups of terms), but to provide better context for interpreting and validating queries, search results may now be viewed as highlighted passages in the context of full text. To facilitate biocuration efforts, TPC also allows users to select text spans from the full text and annotate them, create customized curation forms for any data type, and send resulting annotations to external curation databases. As an example of such a curation form, we describe integration of TPC with the Noctua curation tool developed by the Gene Ontology (GO) Consortium.

Conclusion: Textpresso Central is an online literature search and curation platform that enables biocurators and biomedical researchers to search and mine the full text of literature by integrating keyword and category searches with viewing search results in the context of the full text. It also allows users to create customized curation interfaces, use those interfaces to make annotations linked to supporting evidence statements, and then send those annotations to any database in the world. Textpresso Central URL: http://www.textpresso.org/tpc.

Keywords: Information extraction; Information retrieval; Literature curation; Literature search engine; Model organism databases; Ontology; Text mining.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
Basic processing pipelines for the Textpresso Central system. The processing includes the full text as well as bibliographic information
Fig. 2
Fig. 2
Components of the web interface (hexagons) and their interactions with data and processing units of the system (rectangles). The bright yellow components have been implemented, the light yellow ones are planned
Fig. 3
Fig. 3
Searches can be restricted to particular literatures
Fig. 4
Fig. 4
The paper manager. Papers can be uploaded in NXML or PDF format and then organized into literatures as shown here
Fig. 5
Fig. 5
a Columns of Postgres tables can provide auto-complete and validation information and are specified in this interface. b Fields can be prepopulated in various ways, among them with terms and underlying categories found in text spans that are marked by the curator
Fig. 6
Fig. 6
Textpresso Central keyword search
Fig. 7
Fig. 7
Textpresso Central Category Search. a Selecting multiple categories. b Search results for the multi-category search of C. elegans Genes, C. elegans alleles, and C. elegans organs
Fig. 8
Fig. 8
Results of a Textpresso Central Keyword and Category Search
Fig. 9
Fig. 9
The Textpresso Central Customization Module for Creating Curation Forms
Fig. 10
Fig. 10
Performing Annotation in Textpresso Central a Highlighting Evidence Sentences for Annotation in the Paper Viewer. b Creating GO Molecular Function Annotations
Fig. 11
Fig. 11
Textpresso Central Annotation Exported to Noctua

References

    1. Krallinger M, Valencia A, Hirschman L. Linking genes to literature: text mining, information extraction, and retrieval applications for biology. Genome Biol. 2008;9(Suppl 2):S8. doi: 10.1186/gb-2008-9-s2-s8. - DOI - PMC - PubMed
    1. Burkhardt K, Schneider B, Ory J. A biocurator perspective: annotation at the research collaboratory for structural bioinformatics protein data bank. PLoS Comput Biol. 2006;2(10):e99. doi: 10.1371/journal.pcbi.0020099. - DOI - PMC - PubMed
    1. Baumgartner WA, Jr, Cohen KB, Fox LM, Acquaah-Mensah G, Hunter L. Manual curation is not sufficient for annotation of genomic databases. Bioinformatics. 2007;23(13):i41–i48. doi: 10.1093/bioinformatics/btm229. - DOI - PMC - PubMed
    1. Burge S, Attwood TK, Bateman A, Berardini TZ, Cherry M, O'Donovan C, Xenarios L, Gaudet P. Biocurators and biocuration:surveying the 21st century challenges. Database. 2012;2012:bar059. - PMC - PubMed
    1. Bourne PE, Lorsch JR, Green ED. Perspective: sustaining the big-data ecosystem. Nature. 2015;527:S16–S17. doi: 10.1038/527S16a. - DOI - PubMed

Publication types

LinkOut - more resources