Systematic Evaluation of Research Progress on Natural Language Processing in Medicine Over the Past 20 Years: Bibliometric Study on PubMed

doi:10.2196/16816

Review

. 2020 Jan 23;22(1):e16816.

doi: 10.2196/16816.

Systematic Evaluation of Research Progress on Natural Language Processing in Medicine Over the Past 20 Years: Bibliometric Study on PubMed

Jing Wang¹, Huan Deng¹, Bangtao Liu¹, Anbin Hu¹, Jun Liang², Lingye Fan³, Xu Zheng⁴, Tong Wang⁵, Jianbo Lei^{1

4

6}

Affiliations

¹ School of Medical Informatics and Engineering, Southwest Medical University, Luzhou, China.
² IT Center, Second Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China.
³ Affiliated Hospital, Southwest Medical University, Luzhou, China.
⁴ Center for Medical Informatics, Peking University, Beijing, China.
⁵ School of Public Health, Jilin University, Jilin, China.
⁶ Institute of Medical Technology, Health Science Center, Peking University, Beijing, China.

PMID: 32012074
PMCID: PMC7005695
DOI: 10.2196/16816

Review

Systematic Evaluation of Research Progress on Natural Language Processing in Medicine Over the Past 20 Years: Bibliometric Study on PubMed

Jing Wang et al. J Med Internet Res. 2020.

. 2020 Jan 23;22(1):e16816.

doi: 10.2196/16816.

Authors

Jing Wang¹, Huan Deng¹, Bangtao Liu¹, Anbin Hu¹, Jun Liang², Lingye Fan³, Xu Zheng⁴, Tong Wang⁵, Jianbo Lei^{1

4

6}

Affiliations

¹ School of Medical Informatics and Engineering, Southwest Medical University, Luzhou, China.
² IT Center, Second Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China.
³ Affiliated Hospital, Southwest Medical University, Luzhou, China.
⁴ Center for Medical Informatics, Peking University, Beijing, China.
⁵ School of Public Health, Jilin University, Jilin, China.
⁶ Institute of Medical Technology, Health Science Center, Peking University, Beijing, China.

PMID: 32012074
PMCID: PMC7005695
DOI: 10.2196/16816

Abstract

Background: Natural language processing (NLP) is an important traditional field in computer science, but its application in medical research has faced many challenges. With the extensive digitalization of medical information globally and increasing importance of understanding and mining big data in the medical field, NLP is becoming more crucial.

Objective: The goal of the research was to perform a systematic review on the use of NLP in medical research with the aim of understanding the global progress on NLP research outcomes, content, methods, and study groups involved.

Methods: A systematic review was conducted using the PubMed database as a search platform. All published studies on the application of NLP in medicine (except biomedicine) during the 20 years between 1999 and 2018 were retrieved. The data obtained from these published studies were cleaned and structured. Excel (Microsoft Corp) and VOSviewer (Nees Jan van Eck and Ludo Waltman) were used to perform bibliometric analysis of publication trends, author orders, countries, institutions, collaboration relationships, research hot spots, diseases studied, and research methods.

Results: A total of 3498 articles were obtained during initial screening, and 2336 articles were found to meet the study criteria after manual screening. The number of publications increased every year, with a significant growth after 2012 (number of publications ranged from 148 to a maximum of 302 annually). The United States has occupied the leading position since the inception of the field, with the largest number of articles published. The United States contributed to 63.01% (1472/2336) of all publications, followed by France (5.44%, 127/2336) and the United Kingdom (3.51%, 82/2336). The author with the largest number of articles published was Hongfang Liu (70), while Stéphane Meystre (17) and Hua Xu (33) published the largest number of articles as the first and corresponding authors. Among the first author's affiliation institution, Columbia University published the largest number of articles, accounting for 4.54% (106/2336) of the total. Specifically, approximately one-fifth (17.68%, 413/2336) of the articles involved research on specific diseases, and the subject areas primarily focused on mental illness (16.46%, 68/413), breast cancer (5.81%, 24/413), and pneumonia (4.12%, 17/413).

Conclusions: NLP is in a period of robust development in the medical field, with an average of approximately 100 publications annually. Electronic medical records were the most used research materials, but social media such as Twitter have become important research materials since 2015. Cancer (24.94%, 103/413) was the most common subject area in NLP-assisted medical research on diseases, with breast cancers (23.30%, 24/103) and lung cancers (14.56%, 15/103) accounting for the highest proportions of studies. Columbia University and the talents trained therein were the most active and prolific research forces on NLP in the medical field.

Keywords: clinical; electronic medical record; information extraction; medicine; natural language processing.

©Jing Wang, Huan Deng, Bangtao Liu, Anbin Hu, Jun Liang, Lingye Fan, Xu Zheng, Tong Wang, Jianbo Lei. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 23.01.2020.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

**Figure 1**
Preferred Reporting Items for Systematic Reviews and Meta-Analyses flow diagram depicting the screening procedure for articles on natural language processing (NLP) in the medical field.

**Figure 2**
Graph showing the number of articles published over time.

**Figure 3**
Trend in the number of articles published over 20 years in the top five countries with the most articles published.

**Figure 4**
(A) Network visualization of author co-occurrences analyzed using VOSviewer. A circle represents an author, the size of the circle represents the importance, and the thickness of the link connecting the circles represents the relatedness of the connections. Circles with the same color belong to the same cluster. (B) Overlay visualization generated in VOSviewer (Centre for Science and Technology Studies, Leiden University). A color closer to blue represents an earlier time and closer to red represents a time closer to 2018 (note: refer to Multimedia Appendix 1 for details on the two diagrams and related discussions).

**Figure 5**
(A) Distribution of keywords. A circle represents an identified keyword, the size of the circle represents the importance, and the thickness of the link connecting the circles represents the relatedness of the connections among the keywords. Circles with the same color belong to the same cluster. (B) Changes in keywords over time. A color closer to blue represents an earlier time and closer to red represents a time closer to 2018 (note: refer to Multimedia Appendix 1 for details on the two diagrams and related discussions).

**Figure 6**
Ranking of disease categories based on studies that used natural language processing for the investigation of disease cases.

**Figure 7**
Temporal distribution of studies that used natural language processing for the investigation of disease cases (note: this figure shows the names of the top three diseases in studies that used natural language processing to investigate disease cases each year. Fewer than three disease types indicates that only one or two diseases were studied in the year. The term cancer in the figure indicates the article only mentioned the term cancer, without specifying the type of cancer).

**Figure 8**
Distribution of diseases in studies that used natural language processing for the investigation of disease cases in the United States, China, United Kingdom, and Australia.

**Figure 9**
Top five ranks of the research tasks of natural language processing (NLP) in the medical field.

See this image and copyright information in PMC

Cited by

Biosimilars in the Era of Artificial Intelligence-International Regulations and the Use in Oncological Treatments.
Bas TG, Duarte V. Bas TG, et al. Pharmaceuticals (Basel). 2024 Jul 10;17(7):925. doi: 10.3390/ph17070925. Pharmaceuticals (Basel). 2024. PMID: 39065775 Free PMC article. Review.
Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records.
Fang A, Hu J, Zhao W, Feng M, Fu J, Feng S, Lou P, Ren H, Chen X. Fang A, et al. BMC Med Inform Decis Mak. 2022 Mar 23;22(1):72. doi: 10.1186/s12911-022-01810-z. BMC Med Inform Decis Mak. 2022. PMID: 35321705 Free PMC article.
Evolutionary Overview of Consumer Health Informatics: Bibliometric Study on the Web of Science from 1999 to 2019.
Ouyang W, Xie W, Xin Z, He H, Wen T, Peng X, Dai P, Yuan Y, Liu F, Chen Y, Luo A. Ouyang W, et al. J Med Internet Res. 2021 Sep 9;23(9):e21974. doi: 10.2196/21974. J Med Internet Res. 2021. PMID: 34499042 Free PMC article. Review.
Approaches Based on Artificial Intelligence and the Internet of Intelligent Things to Prevent the Spread of COVID-19: Scoping Review.
Adly AS, Adly AS, Adly MS. Adly AS, et al. J Med Internet Res. 2020 Aug 10;22(8):e19104. doi: 10.2196/19104. J Med Internet Res. 2020. PMID: 32584780 Free PMC article.
Artificial intelligence: revolutionizing cardiology with large language models.
Boonstra MJ, Weissenbacher D, Moore JH, Gonzalez-Hernandez G, Asselbergs FW. Boonstra MJ, et al. Eur Heart J. 2024 Feb 1;45(5):332-345. doi: 10.1093/eurheartj/ehad838. Eur Heart J. 2024. PMID: 38170821 Free PMC article.

See all "Cited by" articles

References

1. Cambria E, White B. Jumping NLP curves: a review of natural language processing research [review article] IEEE Comput Intell Mag. 2014 May;9(2):48–57. doi: 10.1109/mci.2014.2307227. - DOI
1. Liddy E. Natural language processing. Scripting Intelligence. 2001;10(1):450–461. doi: 10.1007/978-1-4302-2352-8_3. - DOI
1. Weaver W. Translation. In: Locke WN, Booth AD, editors. Machine Translation of Languages. Cambridge: MIT Press; 1955. pp. 15–23.
1. Dobrow MJ, Bytautas JP, Tharmalingam S, Hagens S. Interoperable electronic health records and health information exchanges: systematic review. JMIR Med Inform. 2019 Jun 06;7(2):e12607. doi: 10.2196/12607. https://medinform.jmir.org/2019/2/e12607/ - DOI - PMC - PubMed
1. Deng H, Wang J, Liu X, Liu B, Lei J. Evaluating the outcomes of medical informatics development as a discipline in China: a publication perspective. Comput Methods Programs Biomed. 2018 Oct;164:75–85. doi: 10.1016/j.cmpb.2018.07.001. - DOI - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

[1] Cambria E, White B. Jumping NLP curves: a review of natural language processing research [review article] IEEE Comput Intell Mag. 2014 May;9(2):48–57. doi: 10.1109/mci.2014.2307227. - DOI

[2] Cambria E, White B. Jumping NLP curves: a review of natural language processing research [review article] IEEE Comput Intell Mag. 2014 May;9(2):48–57. doi: 10.1109/mci.2014.2307227. - DOI

[3] Liddy E. Natural language processing. Scripting Intelligence. 2001;10(1):450–461. doi: 10.1007/978-1-4302-2352-8_3. - DOI

[4] Liddy E. Natural language processing. Scripting Intelligence. 2001;10(1):450–461. doi: 10.1007/978-1-4302-2352-8_3. - DOI

[5] Weaver W. Translation. In: Locke WN, Booth AD, editors. Machine Translation of Languages. Cambridge: MIT Press; 1955. pp. 15–23.

[6] Weaver W. Translation. In: Locke WN, Booth AD, editors. Machine Translation of Languages. Cambridge: MIT Press; 1955. pp. 15–23.

[7] Dobrow MJ, Bytautas JP, Tharmalingam S, Hagens S. Interoperable electronic health records and health information exchanges: systematic review. JMIR Med Inform. 2019 Jun 06;7(2):e12607. doi: 10.2196/12607. https://medinform.jmir.org/2019/2/e12607/ - DOI - PMC - PubMed

[8] Dobrow MJ, Bytautas JP, Tharmalingam S, Hagens S. Interoperable electronic health records and health information exchanges: systematic review. JMIR Med Inform. 2019 Jun 06;7(2):e12607. doi: 10.2196/12607. https://medinform.jmir.org/2019/2/e12607/ - DOI - PMC - PubMed

[9] Deng H, Wang J, Liu X, Liu B, Lei J. Evaluating the outcomes of medical informatics development as a discipline in China: a publication perspective. Comput Methods Programs Biomed. 2018 Oct;164:75–85. doi: 10.1016/j.cmpb.2018.07.001. - DOI - PubMed

[10] Deng H, Wang J, Liu X, Liu B, Lei J. Evaluating the outcomes of medical informatics development as a discipline in China: a publication perspective. Comput Methods Programs Biomed. 2018 Oct;164:75–85. doi: 10.1016/j.cmpb.2018.07.001. - DOI - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Systematic Evaluation of Research Progress on Natural Language Processing in Medicine Over the Past 20 Years: Bibliometric Study on PubMed

Affiliations

Systematic Evaluation of Research Progress on Natural Language Processing in Medicine Over the Past 20 Years: Bibliometric Study on PubMed

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Miscellaneous

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

LinkOut - more resources

Full Text Sources

Miscellaneous