A comparison of rule-based and machine learning approaches for classifying patient portal messages
- PMID: 28750904
- PMCID: PMC5546247
- DOI: 10.1016/j.ijmedinf.2017.06.004
A comparison of rule-based and machine learning approaches for classifying patient portal messages
Abstract
Objective: Secure messaging through patient portals is an increasingly popular way that consumers interact with healthcare providers. The increasing burden of secure messaging can affect clinic staffing and workflows. Manual management of portal messages is costly and time consuming. Automated classification of portal messages could potentially expedite message triage and delivery of care.
Materials and methods: We developed automated patient portal message classifiers with rule-based and machine learning techniques using bag of words and natural language processing (NLP) approaches. To evaluate classifier performance, we used a gold standard of 3253 portal messages manually categorized using a taxonomy of communication types (i.e., main categories of informational, medical, logistical, social, and other communications, and subcategories including prescriptions, appointments, problems, tests, follow-up, contact information, and acknowledgement). We evaluated our classifiers' accuracies in identifying individual communication types within portal messages with area under the receiver-operator curve (AUC). Portal messages often contain more than one type of communication. To predict all communication types within single messages, we used the Jaccard Index. We extracted the variables of importance for the random forest classifiers.
Results: The best performing approaches to classification for the major communication types were: logistic regression for medical communications (AUC: 0.899); basic (rule-based) for informational communications (AUC: 0.842); and random forests for social communications and logistical communications (AUCs: 0.875 and 0.925, respectively). The best performing classification approach of classifiers for individual communication subtypes was random forests for Logistical-Contact Information (AUC: 0.963). The Jaccard Indices by approach were: basic classifier, Jaccard Index: 0.674; Naïve Bayes, Jaccard Index: 0.799; random forests, Jaccard Index: 0.859; and logistic regression, Jaccard Index: 0.861. For medical communications, the most predictive variables were NLP concepts (e.g., Temporal_Concept, which maps to 'morning', 'evening' and Idea_or_Concept which maps to 'appointment' and 'refill'). For logistical communications, the most predictive variables contained similar numbers of NLP variables and words (e.g., Telephone mapping to 'phone', 'insurance'). For social and informational communications, the most predictive variables were words (e.g., social: 'thanks', 'much', informational: 'question', 'mean').
Conclusions: This study applies automated classification methods to the content of patient portal messages and evaluates the application of NLP techniques on consumer communications in patient portal messages. We demonstrated that random forest and logistic regression approaches accurately classified the content of portal messages, although the best approach to classification varied by communication type. Words were the most predictive variables for classification of most communication types, although NLP variables were most predictive for medical communication types. As adoption of patient portals increases, automated techniques could assist in understanding and managing growing volumes of messages. Further work is needed to improve classification performance to potentially support message triage and answering.
Keywords: Machine learning; Natural language processing; Patient portal; Text classification.
Copyright © 2017 Elsevier B.V. All rights reserved.
Conflict of interest statement
Figures





Similar articles
-
Classifying patient portal messages using Convolutional Neural Networks.J Biomed Inform. 2017 Oct;74:59-70. doi: 10.1016/j.jbi.2017.08.014. Epub 2017 Aug 30. J Biomed Inform. 2017. PMID: 28864104
-
Automated Classification of Consumer Health Information Needs in Patient Portal Messages.AMIA Annu Symp Proc. 2015 Nov 5;2015:1861-70. eCollection 2015. AMIA Annu Symp Proc. 2015. PMID: 26958285 Free PMC article.
-
Complexity of medical decision-making in care provided by surgeons through patient portals.J Surg Res. 2017 Jun 15;214:93-101. doi: 10.1016/j.jss.2017.02.077. Epub 2017 Mar 8. J Surg Res. 2017. PMID: 28624066 Free PMC article.
-
Using natural language processing to classify social work interventions.Am J Manag Care. 2021 Jan 1;27(1):e24-e31. doi: 10.37765/ajmc.2021.88580. Am J Manag Care. 2021. PMID: 33471465 Free PMC article. Review.
-
Applying natural language processing and machine learning techniques to patient experience feedback: a systematic review.BMJ Health Care Inform. 2021 Mar;28(1):e100262. doi: 10.1136/bmjhci-2020-100262. BMJ Health Care Inform. 2021. PMID: 33653690 Free PMC article.
Cited by
-
Improving Cancer Care Communication: Identifying Sociodemographic Differences in Patient Portal Secure Messages Not Authored by the Patient.Appl Clin Inform. 2023 Mar;14(2):296-299. doi: 10.1055/a-2015-8679. Epub 2023 Jan 19. Appl Clin Inform. 2023. PMID: 36657471 Free PMC article. No abstract available.
-
Automating the Classification of Complexity of Medical Decision-Making in Patient-Provider Messaging in a Patient Portal.J Surg Res. 2020 Nov;255:224-232. doi: 10.1016/j.jss.2020.05.039. Epub 2020 Jun 19. J Surg Res. 2020. PMID: 32570124 Free PMC article.
-
A systematic literature review of machine learning in online personal health data.J Am Med Inform Assoc. 2019 Jun 1;26(6):561-576. doi: 10.1093/jamia/ocz009. J Am Med Inform Assoc. 2019. PMID: 30908576 Free PMC article.
-
Automatic uncovering of patient primary concerns in portal messages using a fusion framework of pretrained language models.J Am Med Inform Assoc. 2024 Aug 1;31(8):1714-1724. doi: 10.1093/jamia/ocae144. J Am Med Inform Assoc. 2024. PMID: 38934289 Free PMC article.
-
Natural language processing with machine learning methods to analyze unstructured patient-reported outcomes derived from electronic health records: A systematic review.Artif Intell Med. 2023 Dec;146:102701. doi: 10.1016/j.artmed.2023.102701. Epub 2023 Nov 1. Artif Intell Med. 2023. PMID: 38042599 Free PMC article.
References
-
- Shapochka A. Providers Turn to Portals to Meet Patient Demand. Meaningful Use / Journal of AHIMA. 2012
-
- Tang PC, Lansky D. The missing link: bridging the patient-provider health information gap. Health Aff (Millwood) 2005;24:1290–1295. - PubMed
-
- Bussey-Smith KL, Rossen RD. A systematic review of randomized control trials evaluating the effectiveness of interactive computerized asthma patient education programs. Ann Allergy Asthma Immunol. 2007;98:507–516. quiz 516, 566. - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials