Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Mar 22;18(Suppl 1):14.
doi: 10.1186/s12911-018-0594-x.

A bibliometric analysis of natural language processing in medical research

Affiliations

A bibliometric analysis of natural language processing in medical research

Xieling Chen et al. BMC Med Inform Decis Mak. .

Abstract

Background: Natural language processing (NLP) has become an increasingly significant role in advancing medicine. Rich research achievements of NLP methods and applications for medical information processing are available. It is of great significance to conduct a deep analysis to understand the recent development of NLP-empowered medical research field. However, limited study examining the research status of this field could be found. Therefore, this study aims to quantitatively assess the academic output of NLP in medical research field.

Methods: We conducted a bibliometric analysis on NLP-empowered medical research publications retrieved from PubMed in the period 2007-2016. The analysis focused on three aspects. Firstly, the literature distribution characteristics were obtained with a statistics analysis method. Secondly, a network analysis method was used to reveal scientific collaboration relations. Finally, thematic discovery and evolution was reflected using an affinity propagation clustering method.

Results: There were 1405 NLP-empowered medical research publications published during the 10 years with an average annual growth rate of 18.39%. 10 most productive publication sources together contributed more than 50% of the total publications. The USA had the highest number of publications. A moderately significant correlation between country's publications and GDP per capita was revealed. Denny, Joshua C was the most productive author. Mayo Clinic was the most productive affiliation. The annual co-affiliation and co-country rates reached 64.04% and 15.79% in 2016, respectively. 10 main great thematic areas were identified including Computational biology, Terminology mining, Information extraction, Text classification, Social medium as data source, Information retrieval, etc. CONCLUSIONS: A bibliometric analysis of NLP-empowered medical research publications for uncovering the recent research status is presented. The results can assist relevant researchers, especially newcomers in understanding the research development systematically, seeking scientific cooperation partners, optimizing research topic choices and monitoring new scientific or technological activities.

Keywords: Bibliometrics; Medical; Natural language processing; Scientific collaboration; Statistical characteristics; Thematic discovery and evolution.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
The number and growth rate of publications by year
Fig. 2
Fig. 2
Geomap visualization of publications by country (the more publications one country had, the closer the color was to red)
Fig. 3
Fig. 3
Force directed network of 87 authors with #pub. > = 8
Fig. 4
Fig. 4
Force directed network of 50 affiliations with #pub. > = 10
Fig. 5
Fig. 5
Heatmap of AP clustering result for the 2007–2016 period

References

    1. Cambria E, White B. Jumping NLP curves: a review of natural language processing research. IEEE Comput Intell Mag. 2014;9(2):48–57. doi: 10.1109/MCI.2014.2307227. - DOI
    1. Liddy ED. Encyclopedia of Library and Information Science. New York: 2nd ed; 2001. Natural language processing; pp. 2126–2136.
    1. Batet M, Sánchez D, Valls A. An ontology-based measure to compute semantic similarity in biomedicine. J Biomed Inform. 2011;44(1):118–125. doi: 10.1016/j.jbi.2010.09.002. - DOI - PubMed
    1. Meystre S, Automation HPJ. Of a problem list using natural language processing. BMC medical informatics and decision making. 2005;5(1):30. doi: 10.1186/1472-6947-5-30. - DOI - PMC - PubMed
    1. Wang PW, Hao TY, Jin LW, Yan J. Large-Scale Extraction of drug-disease pairs from biomedical literature for drug repurposing. Journal of the Association for Information Science and Technology. 2017;68(11):2649–2661. doi: 10.1002/asi.23876. - DOI

Publication types

LinkOut - more resources