pmparser and PMDB: resources for large-scale, open studies of the biomedical literature
- PMID: 33763309
- PMCID: PMC7955988
- DOI: 10.7717/peerj.11071
pmparser and PMDB: resources for large-scale, open studies of the biomedical literature
Abstract
PubMed is an invaluable resource for the biomedical community. Although PubMed is freely available, the existing API is not designed for large-scale analyses and the XML structure of the underlying data is inconvenient for complex queries. We developed an R package called pmparser to convert the data in PubMed to a relational database. Our implementation of the database, called PMDB, currently contains data on over 31 million PubMed Identifiers (PMIDs) and is updated regularly. Together, pmparser and PMDB can enable large-scale, reproducible, and transparent analyses of the biomedical literature. pmparser is licensed under GPL-2 and available at https://pmparser.hugheylab.org. PMDB is available in both PostgreSQL (DOI 10.5281/zenodo.4008109) and Google BigQuery (https://console.cloud.google.com/bigquery?project=pmdb-bq&d=pmdb).
Keywords: Database; Parsing; Publishing; Pubmed.
© 2021 Schoenbachler and Hughey.
Conflict of interest statement
Jacob J. Hughey is an Academic Editor for PeerJ.
Figures


Similar articles
-
The PMDB Protein Model Database.Nucleic Acids Res. 2006 Jan 1;34(Database issue):D306-9. doi: 10.1093/nar/gkj105. Nucleic Acids Res. 2006. PMID: 16381873 Free PMC article.
-
PubTator 3.0: an AI-powered Literature Resource for Unlocking Biomedical Knowledge.ArXiv [Preprint]. 2024 Jan 19:arXiv:2401.11048v1. ArXiv. 2024. Update in: Nucleic Acids Res. 2024 Jul 5;52(W1):W540-W546. doi: 10.1093/nar/gkae235. PMID: 38410657 Free PMC article. Updated. Preprint.
-
PubMedPortable: A Framework for Supporting the Development of Text Mining Applications.PLoS One. 2016 Oct 5;11(10):e0163794. doi: 10.1371/journal.pone.0163794. eCollection 2016. PLoS One. 2016. PMID: 27706202 Free PMC article.
-
PMC text mining subset in BioC: about three million full-text articles and growing.Bioinformatics. 2019 Sep 15;35(18):3533-3535. doi: 10.1093/bioinformatics/btz070. Bioinformatics. 2019. PMID: 30715220 Free PMC article.
-
Literature searches on Ayurveda: An update.Ayu. 2015 Jul-Sep;36(3):238-53. doi: 10.4103/0974-8520.182754. Ayu. 2015. PMID: 27313409 Free PMC article. Review.
Cited by
-
Comparison analysis of metabolite profiling in seeds and bark of Ulmus parvifolia, a Chinese medicine species.Plant Signal Behav. 2022 Dec 31;17(1):2138041. doi: 10.1080/15592324.2022.2138041. Plant Signal Behav. 2022. PMID: 36317599 Free PMC article.
References
-
- Achakulvisut T, Acuna D, Kording K. Pubmed parser: a python parser for pubmed open-access XML subset and MEDLINE XML dataset XML dataset. Journal of Open Source Software. 2020;5(46):1979. doi: 10.21105/joss.01979. - DOI
-
- Boyack KW, Newman D, Duhon RJ, Klavans R, Patek M, Biberstine JR, Schijvenaars B, Skupin A, Ma N, Börner K. Clustering more than two million biomedical publications: comparing the accuracies of nine text-based similarity approaches. PLOS ONE. 2011;6(3):e18029. doi: 10.1371/journal.pone.0018029. - DOI - PMC - PubMed
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources