Observational Study

. 2020 Apr 29;20(1):79.

doi: 10.1186/s12911-020-1099-y.

Publicly available machine learning models for identifying opioid misuse from the clinical notes of hospitalized patients

Brihat Sharma¹, Dmitriy Dligach^{1

2}, Kristin Swope³, Elizabeth Salisbury-Afshar⁴, Niranjan S Karnik⁵, Cara Joyce^{2

3}, Majid Afshar^{6

7

8}

Affiliations

¹ Department of Computer Science, Loyola University Chicago, Chicago, IL, USA.
² Center for Health Outcomes and Informatics Research, Loyola University Chicago, 2160 S. First Avenue, Maywood, IL, 60156, USA.
³ Stritch School of Medicine, Loyola University Chicago, Maywood, IL, USA.
⁴ Center for Multi-System Solutions to the Opioid Epidemic, American Institute for Research, Chicago, IL, USA.
⁵ Department of Psychiatry, Rush University Medical Center, Chicago, IL, USA.
⁶ Center for Health Outcomes and Informatics Research, Loyola University Chicago, 2160 S. First Avenue, Maywood, IL, 60156, USA. Majid.afshar@lumc.edu.
⁷ Department of Health Informatics and Data Science, Loyola University Chicago, Maywood, IL, USA. Majid.afshar@lumc.edu.
⁸ Department of Medicine, Loyola University Medical Center, Maywood, IL, USA. Majid.afshar@lumc.edu.

PMID: 32349766
PMCID: PMC7191715
DOI: 10.1186/s12911-020-1099-y

Observational Study

Publicly available machine learning models for identifying opioid misuse from the clinical notes of hospitalized patients

Brihat Sharma et al. BMC Med Inform Decis Mak. 2020.

. 2020 Apr 29;20(1):79.

doi: 10.1186/s12911-020-1099-y.

Authors

Brihat Sharma¹, Dmitriy Dligach^{1

2}, Kristin Swope³, Elizabeth Salisbury-Afshar⁴, Niranjan S Karnik⁵, Cara Joyce^{2

3}, Majid Afshar^{6

7

8}

Affiliations

¹ Department of Computer Science, Loyola University Chicago, Chicago, IL, USA.
² Center for Health Outcomes and Informatics Research, Loyola University Chicago, 2160 S. First Avenue, Maywood, IL, 60156, USA.
³ Stritch School of Medicine, Loyola University Chicago, Maywood, IL, USA.
⁴ Center for Multi-System Solutions to the Opioid Epidemic, American Institute for Research, Chicago, IL, USA.
⁵ Department of Psychiatry, Rush University Medical Center, Chicago, IL, USA.
⁶ Center for Health Outcomes and Informatics Research, Loyola University Chicago, 2160 S. First Avenue, Maywood, IL, 60156, USA. Majid.afshar@lumc.edu.
⁷ Department of Health Informatics and Data Science, Loyola University Chicago, Maywood, IL, USA. Majid.afshar@lumc.edu.
⁸ Department of Medicine, Loyola University Medical Center, Maywood, IL, USA. Majid.afshar@lumc.edu.

PMID: 32349766
PMCID: PMC7191715
DOI: 10.1186/s12911-020-1099-y

Abstract

Background: Automated de-identification methods for removing protected health information (PHI) from the source notes of the electronic health record (EHR) rely on building systems to recognize mentions of PHI in text, but they remain inadequate at ensuring perfect PHI removal. As an alternative to relying on de-identification systems, we propose the following solutions: (1) Mapping the corpus of documents to standardized medical vocabulary (concept unique identifier [CUI] codes mapped from the Unified Medical Language System) thus eliminating PHI as inputs to a machine learning model; and (2) training character-based machine learning models that obviate the need for a dictionary containing input words/n-grams. We aim to test the performance of models with and without PHI in a use-case for an opioid misuse classifier.

Methods: An observational cohort sampled from adult hospital inpatient encounters at a health system between 2007 and 2017. A case-control stratified sampling (n = 1000) was performed to build an annotated dataset for a reference standard of cases and non-cases of opioid misuse. Models for training and testing included CUI codes, character-based, and n-gram features. Models applied were machine learning with neural network and logistic regression as well as expert consensus with a rule-based model for opioid misuse. The area under the receiver operating characteristic curves (AUROC) were compared between models for discrimination. The Hosmer-Lemeshow test and visual plots measured model fit and calibration.

Results: Machine learning models with CUI codes performed similarly to n-gram models with PHI. The top performing models with AUROCs > 0.90 included CUI codes as inputs to a convolutional neural network, max pooling network, and logistic regression model. The top calibrated models with the best model fit were the CUI-based convolutional neural network and max pooling network. The top weighted CUI codes in logistic regression has the related terms 'Heroin' and 'Victim of abuse'.

Conclusions: We demonstrate good test characteristics for an opioid misuse computable phenotype that is void of any PHI and performs similarly to models that use PHI. Herein we share a PHI-free, trained opioid misuse classifier for other researchers and health systems to use and benchmark to overcome privacy and security concerns.

Keywords: Computable phenotype; Heroin; Machine learning; Natural language processing; Opioid misuse; Opioid use disorder.

PubMed Disclaimer

Conflict of interest statement

The authors declare they have no competing interests.

Figures

**Fig. 1**
PHI-free and PHI-laden inputs to a machine learning model with an example of a convolutional neural network using an embedding with Concept Unique Identifiers (CUIs)

**Fig. 2**
Receiver operating characteristics area under the curve for convolutional neural network model using concept unique identifiers (CUI) for classification of opioid misuse. CNN = convolutions neural network; AUC = area under the curve

**Fig. 3**
Calibration plot for top performing machine learning classifiers for opioid misuse. The diagonal line represents perfect calibration between predicted probabilities that are observed (y-axis) and predicted (x-axis). CNN = convolutions neural network; CUIs = concept unique identifiers; LR = logistic regression; MPN = max pooling network

See this image and copyright information in PMC

References

1. Demner-Fushman D, Chapman WW, McDonald CJ. What can natural language processing do for clinical decision support? J Biomed Inform. 2009;42:760–772. doi: 10.1016/j.jbi.2009.08.007. - DOI - PMC - PubMed
1. Jones BE, South BR, Shao Y, et al. Development and validation of a natural language processing tool to identify patients treated for pneumonia across VA emergency departments. Appl Clin Inform. 2018;9:122–128. doi: 10.1055/s-0038-1626725. - DOI - PMC - PubMed
1. Castro VM, Dligach D, Finan S, et al. Large-scale identification of patients with cerebral aneurysms using natural language processing. Neurology. 2017;88:164–168. doi: 10.1212/WNL.0000000000003490. - DOI - PMC - PubMed
1. Carrell DS, Cronkite D, Palmer RE, Saunders K, Gross DE, Masters ET, Hylan TR, Von Korff M. Using natural language processing to identify problem usage of prescription opioids. Int J Med Inform. 2015;84:1057–1064. doi: 10.1016/j.ijmedinf.2015.09.002. - DOI - PubMed
1. Friedlin FJ, McDonald CJ. A software tool for removing patient identifying information from clinical documents. J Am Med Inform Assoc. 2008;15:601–610. doi: 10.1197/jamia.M2702. - DOI - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Medical
- ClinicalTrials.gov
- MedlinePlus Health Information
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Publicly available machine learning models for identifying opioid misuse from the clinical notes of hospitalized patients

Affiliations

Publicly available machine learning models for identifying opioid misuse from the clinical notes of hospitalized patients

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical

Research Materials