Meta-Analysis

. 2024 Feb 2;24(1):33.

doi: 10.1186/s12911-024-02416-3.

The validity of electronic health data for measuring smoking status: a systematic review and meta-analysis

Md Ashiqul Haque¹, Muditha Lakmali Bodawatte Gedara¹, Nathan Nickel¹, Maxime Turgeon², Lisa M Lix³

Affiliations

¹ Department of Community Health Sciences, University of Manitoba, Winnipeg, MB, Canada.
² Department of Statistics, University of Manitoba, Winnipeg, MB, Canada.
³ Department of Community Health Sciences, University of Manitoba, Winnipeg, MB, Canada. lisa.lix@umanitoba.ca.

PMID: 38308231
PMCID: PMC10836023
DOI: 10.1186/s12911-024-02416-3

Meta-Analysis

The validity of electronic health data for measuring smoking status: a systematic review and meta-analysis

Md Ashiqul Haque et al. BMC Med Inform Decis Mak. 2024.

. 2024 Feb 2;24(1):33.

doi: 10.1186/s12911-024-02416-3.

Authors

Md Ashiqul Haque¹, Muditha Lakmali Bodawatte Gedara¹, Nathan Nickel¹, Maxime Turgeon², Lisa M Lix³

Affiliations

¹ Department of Community Health Sciences, University of Manitoba, Winnipeg, MB, Canada.
² Department of Statistics, University of Manitoba, Winnipeg, MB, Canada.
³ Department of Community Health Sciences, University of Manitoba, Winnipeg, MB, Canada. lisa.lix@umanitoba.ca.

PMID: 38308231
PMCID: PMC10836023
DOI: 10.1186/s12911-024-02416-3

Abstract

Background: Smoking is a risk factor for many chronic diseases. Multiple smoking status ascertainment algorithms have been developed for population-based electronic health databases such as administrative databases and electronic medical records (EMRs). Evidence syntheses of algorithm validation studies have often focused on chronic diseases rather than risk factors. We conducted a systematic review and meta-analysis of smoking status ascertainment algorithms to describe the characteristics and validity of these algorithms.

Methods: The Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines were followed. We searched articles published from 1990 to 2022 in EMBASE, MEDLINE, Scopus, and Web of Science with key terms such as validity, administrative data, electronic health records, smoking, and tobacco use. The extracted information, including article characteristics, algorithm characteristics, and validity measures, was descriptively analyzed. Sources of heterogeneity in validity measures were estimated using a meta-regression model. Risk of bias (ROB) in the reviewed articles was assessed using the Quality Assessment of Diagnostic Accuracy Studies-2 tool.

Results: The initial search yielded 2086 articles; 57 were selected for review and 116 algorithms were identified. Almost three-quarters (71.6%) of algorithms were based on EMR data. The algorithms were primarily constructed using diagnosis codes for smoking-related conditions, although prescription medication codes for smoking treatments were also adopted. About half of the algorithms were developed using machine-learning models. The pooled estimates of positive predictive value, sensitivity, and specificity were 0.843, 0.672, and 0.918 respectively. Algorithm sensitivity and specificity were highly variable and ranged from 3 to 100% and 36 to 100%, respectively. Model-based algorithms had significantly greater sensitivity (p = 0.006) than rule-based algorithms. Algorithms for EMR data had higher sensitivity than algorithms for administrative data (p = 0.001). The ROB was low in most of the articles (76.3%) that underwent the assessment.

Conclusions: Multiple algorithms using different data sources and methods have been proposed to ascertain smoking status in electronic health data. Many algorithms had low sensitivity and positive predictive value, but the data source influenced their validity. Algorithms based on machine-learning models for multiple linked data sources have improved validity.

Keywords: Algorithms; Electronic health records; Review; Routinely collected health data; Validation study.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Fig. 1**
Three-level meta-regression model

**Fig. 2**
Flowchart of the study selection process

**Fig. 3**
Percent (%) of smoking status algorithms characterized by validity measures (n = 116)

**Fig. 4**
Distribution of selected algorithm validity measures, stratified by algorithm data source. Note: The centre horizontal line within the box represents the median (50th percentile); upper and lower bounds of the box indicate the 25th and 75th percentiles; dashed lines connect the maximum and minimum values; circles represent outliers. PPV = positive predictive value

**Fig. 5**
Distribution of selected algorithm validity measures, stratified by algorithm data structure. Note: The centre horizontal line within the box represents the median (50th percentile); upper and lower bounds of the box indicate the 25th and 75th percentiles; dashed lines connect the maximum and minimum; circles represent outliers. PPV = positive predictive value

**Fig. 6**
Distribution of selected algorithm validity measures, stratified by use of predictive model in algorithm construction. Note: The centre horizontal line within the box represents the median (50th percentile); upper and lower bounds of the box indicate the 25th and 75th percentiles; dashed lines connect the maximum and minimum with the box; circles represent outliers. PPV = positive predictive value

See this image and copyright information in PMC

References

1. Cowie MR, Blomster JI, Curtis LH, Duclaux S, Ford I, Fritz F, Goldman S, Janmohamed S, Kreuzer J, Leenay M, Michel A. Electronic health records to facilitate clinical research. Clin Res Cardiol. 2017;106:1–9. doi: 10.1007/s00392-016-1025-6. - DOI - PMC - PubMed
1. Lee S, Xu Y, D'Souza AG, Martin EA, Doktorchik C, Zhang Z, Quan H. Unlocking the potential of electronic health records for health research. Int J Popul Data Sci. 2020;5(1):1123. - PMC - PubMed
1. Kierkegaard P. Electronic health record: wiring Europe’s healthcare. Comput Law Secur Rev. 2011;27(5):503–515. doi: 10.1016/j.clsr.2011.07.013. - DOI
1. Harbaugh CM, Cooper JN. Administrative databases. Semin Pediatr Surg. 2018;27(6):353–360. doi: 10.1053/j.sempedsurg.2018.10.001. - DOI - PubMed
1. World Health Organization. Tobacco fact sheet from WHO providing key facts and information on surveillance. https://www.who.int/news-room/fact-sheets/detail/tobacco. Accessed 10 Apr 2022.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The validity of electronic health data for measuring smoking status: a systematic review and meta-analysis

Affiliations

The validity of electronic health data for measuring smoking status: a systematic review and meta-analysis

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources