Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jul 2;47(W1):W587-W593.
doi: 10.1093/nar/gkz389.

PubTator central: automated concept annotation for biomedical full text articles

Affiliations

PubTator central: automated concept annotation for biomedical full text articles

Chih-Hsuan Wei et al. Nucleic Acids Res. .

Abstract

PubTator Central (https://www.ncbi.nlm.nih.gov/research/pubtator/) is a web service for viewing and retrieving bioconcept annotations in full text biomedical articles. PubTator Central (PTC) provides automated annotations from state-of-the-art text mining systems for genes/proteins, genetic variants, diseases, chemicals, species and cell lines, all available for immediate download. PTC annotates PubMed (29 million abstracts) and the PMC Text Mining subset (3 million full text articles). The new PTC web interface allows users to build full text document collections and visualize concept annotations in each document. Annotations are downloadable in multiple formats (XML, JSON and tab delimited) via the online interface, a RESTful web service and bulk FTP. Improved concept identification systems and a new disambiguation module based on deep learning increase annotation accuracy, and the new server-side architecture is significantly faster. PTC is synchronized with PubMed and PubMed Central, with new articles added daily. The original PubTator service has served annotated abstracts for ∼300 million requests, enabling third-party research in use cases such as biocuration support, gene prioritization, genetic disease analysis, and literature-based knowledge discovery. We demonstrate the full text results in PTC significantly increase biomedical concept coverage and anticipate this expansion will both enhance existing downstream applications and enable new use cases.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
PTC processing pipeline. PubMed abstracts and PMC-TM full text articles are annotated by multiple concept taggers (A), conflicts/overlapping annotations handled by the disambiguation module (B) and results stored in the database (C).
Figure 2.
Figure 2.
Displaying the abstract or full-text of a publication and related tools.
Figure 3.
Figure 3.
Comparison of annotation coverage between processing PubMed abstracts and processing both abstracts and full text articles from PMC-TM.

References

    1. Singhal A., Leaman R., Catlett N., Lemberger T., McEntyre J., Polson S., Xenarios I., Arighi C., Lu Z.. Pressing needs of biomedical text mining in biocuration and beyond: opportunities and challenges. Database. 2016; 2016:baw161. - PMC - PubMed
    1. Wei C.-H., Peng Y., Leaman R., Davis A.P., Mattingly C.J., Li J., Wiegers T.C., Lu Z.. Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task. Database. 2016; 2016:baw032. - PMC - PubMed
    1. Garcia-Pelaez J., Rodriguez D., Medina-Molina R., Garcia-Rivas G., Jerjes-Sánchez C., Trevino V.. PubTerm: a web tool for organizing, annotating and curating genes, diseases, molecules and other concepts from PubMed records. Database. 2019; 2019:bay137. - PMC - PubMed
    1. Soto A.J., Przybyła P., Ananiadou S.. Thalia: Semantic search engine for biomedical abstracts. Bioinformatics. 2018; bty871. - PMC - PubMed
    1. Matos S. Configurable web-services for biomedical document annotation. J. Cheminform. 2018; 2018:68. - PMC - PubMed

Publication types