Using natural language processing to enable in-depth analysis of clinical messages posted to an Internet mailing list: a feasibility study
- PMID: 22112583
- PMCID: PMC3236668
- DOI: 10.2196/jmir.1799
Using natural language processing to enable in-depth analysis of clinical messages posted to an Internet mailing list: a feasibility study
Abstract
Background: An Internet mailing list may be characterized as a virtual community of practice that serves as an information hub with easy access to expert advice and opportunities for social networking. We are interested in mining messages posted to a list for dental practitioners to identify clinical topics. Once we understand the topical domain, we can study dentists' real information needs and the nature of their shared expertise, and can avoid delivering useless content at the point of care in future informatics applications. However, a necessary first step involves developing procedures to identify messages that are worth studying given our resources for planned, labor-intensive research.
Objectives: The primary objective of this study was to develop a workflow for finding a manageable number of clinically relevant messages from a much larger corpus of messages posted to an Internet mailing list, and to demonstrate the potential usefulness of our procedures for investigators by retrieving a set of messages tailored to the research question of a qualitative research team.
Methods: We mined 14,576 messages posted to an Internet mailing list from April 2008 to May 2009. The list has about 450 subscribers, mostly dentists from North America interested in clinical practice. After extensive preprocessing, we used the Natural Language Toolkit to identify clinical phrases and keywords in the messages. Two academic dentists classified collocated phrases in an iterative, consensus-based process to describe the topics discussed by dental practitioners who subscribe to the list. We then consulted with qualitative researchers regarding their research question to develop a plan for targeted retrieval. We used selected phrases and keywords as search strings to identify clinically relevant messages and delivered the messages in a reusable database.
Results: About half of the subscribers (245/450, 54.4%) posted messages. Natural language processing (NLP) yielded 279,193 clinically relevant tokens or processed words (19% of all tokens). Of these, 2.02% (5634 unique tokens) represent the vocabulary for dental practitioners. Based on pointwise mutual information score and clinical relevance, 325 collocated phrases (eg, fistula filled obturation and herpes zoster) with 108 keywords (eg, mercury) were classified into 13 broad categories with subcategories. In the demonstration, we identified 305 relevant messages (2.1% of all messages) over 10 selected categories with instances of collocated phrases, and 299 messages (2.1%) with instances of phrases or keywords for the category systemic disease.
Conclusions: A workflow with a sequence of machine-based steps and human classification of NLP-discovered phrases can support researchers who need to identify relevant messages in a much larger corpus. Discovered phrases and keywords are useful search strings to aid targeted retrieval. We demonstrate the potential value of our procedures for qualitative researchers by retrieving a manageable set of messages concerning systemic and oral disease.
Conflict of interest statement
None declared
Figures
Similar articles
-
Are dentists interested in the oral-systemic disease connection? A qualitative study of an online community of 450 practitioners.BMC Oral Health. 2013 Nov 21;13:65. doi: 10.1186/1472-6831-13-65. BMC Oral Health. 2013. PMID: 24261423 Free PMC article.
-
The role of an online community for people with a rare disease: content analysis of messages posted on a primary biliary cirrhosis mailinglist.J Med Internet Res. 2005 Mar 31;7(1):e10. doi: 10.2196/jmir.7.1.e10. J Med Internet Res. 2005. PMID: 15829472 Free PMC article.
-
How cancer survivors provide support on cancer-related Internet mailing lists.J Med Internet Res. 2007 May 14;9(2):e12. doi: 10.2196/jmir.9.2.e12. J Med Internet Res. 2007. PMID: 17513283 Free PMC article.
-
Natural Language Processing and Its Implications for the Future of Medication Safety: A Narrative Review of Recent Advances and Challenges.Pharmacotherapy. 2018 Aug;38(8):822-841. doi: 10.1002/phar.2151. Epub 2018 Jul 22. Pharmacotherapy. 2018. PMID: 29884988 Review.
-
Burn injury: what's in a name? Labels used for burn injury classification: a review of the data from 2000-2012.Ann Burns Fire Disasters. 2013 Sep 30;26(3):115-20. Ann Burns Fire Disasters. 2013. PMID: 24563636 Free PMC article. Review.
Cited by
-
The Cell Research Trends of Asthma: A Stem Frequency Analysis of the Literature.J Healthc Eng. 2018 Aug 23;2018:9363820. doi: 10.1155/2018/9363820. eCollection 2018. J Healthc Eng. 2018. PMID: 30210753 Free PMC article.
-
A Study on Online Health Community Users' Information Demands Based on the BERT-LDA Model.Healthcare (Basel). 2023 Jul 27;11(15):2142. doi: 10.3390/healthcare11152142. Healthcare (Basel). 2023. PMID: 37570382 Free PMC article.
-
Health, Psychosocial, and Social Issues Emanating From the COVID-19 Pandemic Based on Social Media Comments: Text Mining and Thematic Analysis Approach.JMIR Med Inform. 2021 Apr 6;9(4):e22734. doi: 10.2196/22734. JMIR Med Inform. 2021. PMID: 33684052 Free PMC article.
-
Interactive Cohort Identification of Sleep Disorder Patients Using Natural Language Processing and i2b2.Appl Clin Inform. 2015 May 27;6(2):345-63. doi: 10.4338/ACI-2014-11-RA-0106. eCollection 2015. Appl Clin Inform. 2015. PMID: 26171080 Free PMC article.
-
Applied artificial intelligence in dentistry: emerging data modalities and modeling approaches.Front Artif Intell. 2024 Jul 23;7:1427517. doi: 10.3389/frai.2024.1427517. eCollection 2024. Front Artif Intell. 2024. PMID: 39109324 Free PMC article. Review.
References
-
- anonymous Health Sciences Library System. 2011. [2011-05-11]. Pitt Resources Quick Search http://www.hsls.pitt.edu/
-
- Song M, Spallek H, Polk D, Schleyer T, Wali T. How information systems should support the information needs of general dentists in clinical settings: suggestions from a qualitative study. BMC Med Inform Decis Mak. 2010;10:7. doi: 10.1186/1472-6947-10-7. http://www.biomedcentral.com/1472-6947/10/71472-6947-10-7 - DOI - PMC - PubMed
-
- PubMed.gov US National Library of Medicine, National Institutes of Health. 2011. [2011-05-11]. pubMed http://www.ncbi.nlm.nih.gov/pubmed/
-
- American Dental Association ADA. 2011. [2011-05-11]. Fees for Members http://www.ada.org/3791.aspx.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Miscellaneous