. 2021 Sep 18;28(10):2116-2127.

doi: 10.1093/jamia/ocab116.

Automated detection of substance use information from electronic health records for a pediatric population

Yizhao Ni^{1

2}, Alycia Bachtel¹, Katie Nause³, Sarah Beal^{2

3}

Affiliations

¹ Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA.
² Department of Pediatrics, College of Medicine, University of Cincinnati, Cincinnati, Ohio, USA.
³ Division of Psychology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA.

PMID: 34333636
PMCID: PMC8449626
DOI: 10.1093/jamia/ocab116

Automated detection of substance use information from electronic health records for a pediatric population

Yizhao Ni et al. J Am Med Inform Assoc. 2021.

. 2021 Sep 18;28(10):2116-2127.

doi: 10.1093/jamia/ocab116.

Authors

Yizhao Ni^{1

2}, Alycia Bachtel¹, Katie Nause³, Sarah Beal^{2

3}

Affiliations

¹ Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA.
² Department of Pediatrics, College of Medicine, University of Cincinnati, Cincinnati, Ohio, USA.
³ Division of Psychology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA.

PMID: 34333636
PMCID: PMC8449626
DOI: 10.1093/jamia/ocab116

Abstract

Objective: Substance use screening in adolescence is unstandardized and often documented in clinical notes, rather than in structured electronic health records (EHRs). The objective of this study was to integrate logic rules with state-of-the-art natural language processing (NLP) and machine learning technologies to detect substance use information from both structured and unstructured EHR data.

Materials and methods: Pediatric patients (10-20 years of age) with any encounter between July 1, 2012, and October 31, 2017, were included (n = 3890 patients; 19 478 encounters). EHR data were extracted at each encounter, manually reviewed for substance use (alcohol, tobacco, marijuana, opiate, any use), and coded as lifetime use, current use, or family use. Logic rules mapped structured EHR indicators to screening results. A knowledge-based NLP system and a deep learning model detected substance use information from unstructured clinical narratives. System performance was evaluated using positive predictive value, sensitivity, negative predictive value, specificity, and area under the receiver-operating characteristic curve (AUC).

Results: The dataset included 17 235 structured indicators and 27 141 clinical narratives. Manual review of clinical narratives captured 94.0% of positive screening results, while structured EHR data captured 22.0%. Logic rules detected screening results from structured data with 1.0 and 0.99 for sensitivity and specificity, respectively. The knowledge-based system detected substance use information from clinical narratives with 0.86, 0.79, and 0.88 for AUC, sensitivity, and specificity, respectively. The deep learning model further improved detection capacity, achieving 0.88, 0.81, and 0.85 for AUC, sensitivity, and specificity, respectively. Finally, integrating predictions from structured and unstructured data achieved high detection capacity across all cases (0.96, 0.85, and 0.87 for AUC, sensitivity, and specificity, respectively).

Conclusions: It is feasible to detect substance use screening and results among pediatric patients using logic rules, NLP, and machine learning technologies.

Keywords: automated substance use detection; deep learning; electronic health records; natural language processing; pediatric population.

PubMed Disclaimer

Figures

**Figure 1.**
An overview of the automated substance use screening system. C: current; EHR: electronic health record; F: family use; L: lifetime; NLP: natural language processing.

**Figure 2.**
An overview of the substance information screener. C: current; CUI: concept unique identifier; F: family use; L: lifetime; LSTM: long-short term memory; RxNorm: normalized names for clinical drugs; SNOMED: Systematized Nomenclature of Medicine Clinical Terms; UMLS: Unified Medical Language System.

**Figure 3.**
Performance of the logic-based rule matcher in classifying structured indicators. Note that the structured indicators did not contain assertion of family use. The logic-based rule matcher generated determinate classification rather than probabilistic predictions; therefore, we did not report area under the receiver-operating characteristic curve in the evaluation. NPV: negative predictive value; PPV: positive predictive value.

**Figure 4.**
Performance of the knowledge-based natural language processing system in detecting substance use categories and assertions on individual clinical narratives. Error bars indicate 95% confidence intervals. AUC: area under the receiver-operating characteristic curve; NPV: negative predictive value; PPV: positive predictive value.

**Figure 5.**
Performance of the deep learning model in detecting substance use categories and assertions on individual clinical narratives. Error bars indicate 95% confidence intervals. AUC: area under the receiver-operating characteristic curve; NPV: negative predictive value; PPV: positive predictive value.

See this image and copyright information in PMC

References

1. McGinnis JM, Foege WH.. Mortality and morbidity attributable to use of addictive substances in the United States. Proc Assoc Am Physicians 1999; 111 (2): 109–18. - PubMed
1. National Institute on Drug Abuse. Cost of substance abuse. 2020. https://www.drugabuse.gov/drug-topics/trends-statistics/costs-substance-.... Accessed February, 19, 2021.
1. Johnston L, Miech R, O'Malley P, Bachman J, Schulenberg J, Patrick M.. Monitoring the Future National Survey Results on Drug Use, 1975-2018: Overview Key Findings on Adolescent Drug Use. Ann Arbor, MI: Institute for Social Research, University of Michigan; 2019.
1. Substance Abuse and Mental Health Services Administration. Key Substance Use and Mental Health Indicators in the United States: Results from the 2018 National Survey on Drug Use and Health (HHS Publication No. PEP19-5068, NSDUH Series H-54). Rockville, MD: Center for Behavioral Health Statistics and Quality, Substance Abuse and Mental Health Services Administration; 2019.
1. Trenz RC, Scherer M, Harrell P, Zur J, Sinha A, Latimer W.. Early onset of drug and polysubstance use as predictors of injection drug use among adult drug users. Addict Behav 2012; 37 (4): 367–72. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Consumer Health Information
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Automated detection of substance use information from electronic health records for a pediatric population

Affiliations

Automated detection of substance use information from electronic health records for a pediatric population

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical