Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2017 Sep:105:110-120.
doi: 10.1016/j.ijmedinf.2017.06.004. Epub 2017 Jun 23.

A comparison of rule-based and machine learning approaches for classifying patient portal messages

Affiliations
Comparative Study

A comparison of rule-based and machine learning approaches for classifying patient portal messages

Robert M Cronin et al. Int J Med Inform. 2017 Sep.

Abstract

Objective: Secure messaging through patient portals is an increasingly popular way that consumers interact with healthcare providers. The increasing burden of secure messaging can affect clinic staffing and workflows. Manual management of portal messages is costly and time consuming. Automated classification of portal messages could potentially expedite message triage and delivery of care.

Materials and methods: We developed automated patient portal message classifiers with rule-based and machine learning techniques using bag of words and natural language processing (NLP) approaches. To evaluate classifier performance, we used a gold standard of 3253 portal messages manually categorized using a taxonomy of communication types (i.e., main categories of informational, medical, logistical, social, and other communications, and subcategories including prescriptions, appointments, problems, tests, follow-up, contact information, and acknowledgement). We evaluated our classifiers' accuracies in identifying individual communication types within portal messages with area under the receiver-operator curve (AUC). Portal messages often contain more than one type of communication. To predict all communication types within single messages, we used the Jaccard Index. We extracted the variables of importance for the random forest classifiers.

Results: The best performing approaches to classification for the major communication types were: logistic regression for medical communications (AUC: 0.899); basic (rule-based) for informational communications (AUC: 0.842); and random forests for social communications and logistical communications (AUCs: 0.875 and 0.925, respectively). The best performing classification approach of classifiers for individual communication subtypes was random forests for Logistical-Contact Information (AUC: 0.963). The Jaccard Indices by approach were: basic classifier, Jaccard Index: 0.674; Naïve Bayes, Jaccard Index: 0.799; random forests, Jaccard Index: 0.859; and logistic regression, Jaccard Index: 0.861. For medical communications, the most predictive variables were NLP concepts (e.g., Temporal_Concept, which maps to 'morning', 'evening' and Idea_or_Concept which maps to 'appointment' and 'refill'). For logistical communications, the most predictive variables contained similar numbers of NLP variables and words (e.g., Telephone mapping to 'phone', 'insurance'). For social and informational communications, the most predictive variables were words (e.g., social: 'thanks', 'much', informational: 'question', 'mean').

Conclusions: This study applies automated classification methods to the content of patient portal messages and evaluates the application of NLP techniques on consumer communications in patient portal messages. We demonstrated that random forest and logistic regression approaches accurately classified the content of portal messages, although the best approach to classification varied by communication type. Words were the most predictive variables for classification of most communication types, although NLP variables were most predictive for medical communication types. As adoption of patient portals increases, automated techniques could assist in understanding and managing growing volumes of messages. Further work is needed to improve classification performance to potentially support message triage and answering.

Keywords: Machine learning; Natural language processing; Patient portal; Text classification.

PubMed Disclaimer

Conflict of interest statement

Competing interests: None.

Figures

Figure 1
Figure 1
The taxonomy of consumer health information communication types[17, 33, 34].
Figure 2
Figure 2
Example message labeled by communication types
Figure 3
Figure 3
Area under the curve (AUC) of the different major communication types. The Basic Classifier was the Rule Based classifier. The error bars represent the 95% Confidence Interval.
Figure 3
Figure 3
Area under the curve (AUC) of the different major communication types. The Basic Classifier was the Rule Based classifier. The error bars represent the 95% Confidence Interval.
Figure 4
Figure 4
Bar charts of the Jaccard Indices of the different communication types. The Basic Classifier was the Rule Based classifier. The error bars represent the 95% Confidence Interval.

Similar articles

Cited by

References

    1. Shapochka A. Providers Turn to Portals to Meet Patient Demand. Meaningful Use / Journal of AHIMA. 2012
    1. Tang PC, Lansky D. The missing link: bridging the patient-provider health information gap. Health Aff (Millwood) 2005;24:1290–1295. - PubMed
    1. Calabretta N. Consumer-driven, patient-centered health care in the age of electronic information. J Med Libr Assoc. 2002;90:32–37. - PMC - PubMed
    1. Koonce TY, Giuse DA, Beauregard JM, Giuse NB. Toward a more informed patient: bridging health care information through an interactive communication portal. J Med Libr Assoc. 2007;95:77–81. - PMC - PubMed
    1. Bussey-Smith KL, Rossen RD. A systematic review of randomized control trials evaluating the effectiveness of interactive computerized asthma patient education programs. Ann Allergy Asthma Immunol. 2007;98:507–516. quiz 516, 566. - PubMed

Publication types