Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Oct 20;9(19):e016648.
doi: 10.1161/JAHA.120.016648. Epub 2020 Sep 29.

Claims-Based Algorithms for Identifying Patients With Pulmonary Hypertension: A Comparison of Decision Rules and Machine-Learning Approaches

Affiliations

Claims-Based Algorithms for Identifying Patients With Pulmonary Hypertension: A Comparison of Decision Rules and Machine-Learning Approaches

Mei-Sing Ong et al. J Am Heart Assoc. .

Abstract

Background Real-world healthcare data are an important resource for epidemiologic research. However, accurate identification of patient cohorts-a crucial first step underpinning the validity of research results-remains a challenge. We developed and evaluated claims-based case ascertainment algorithms for pulmonary hypertension (PH), comparing conventional decision rules with state-of-the-art machine-learning approaches. Methods and Results We analyzed an electronic health record-Medicare linked database from two large academic tertiary care hospitals (years 2007-2013). Electronic health record charts were reviewed to form a gold standard cohort of patients with (n=386) and without PH (n=164). Using health encounter data captured in Medicare claims (including patients' demographics, diagnoses, medications, and procedures), we developed and compared 2 approaches for identifying patients with PH: decision rules and machine-learning algorithms using penalized lasso regression, random forest, and gradient boosting machine. The most optimal rule-based algorithm-having ≥3 PH-related healthcare encounters and having undergone right heart catheterization-attained an area under the receiver operating characteristic curve of 0.64 (sensitivity, 0.75; specificity, 0.48). All 3 machine-learning algorithms outperformed the most optimal rule-based algorithm (P<0.001). A model derived from the random forest algorithm achieved an area under the receiver operating characteristic curve of 0.88 (sensitivity, 0.87; specificity, 0.70), and gradient boosting machine achieved comparable results (area under the receiver operating characteristic curve, 0.85; sensitivity, 0.87; specificity, 0.70). Penalized lasso regression achieved an area under the receiver operating characteristic curve of 0.73 (sensitivity, 0.70; specificity, 0.68). Conclusions Research-grade case identification algorithms for PH can be derived and rigorously validated using machine-learning algorithms. Simple decision rules commonly applied in published literature performed poorly; more complex rule-based algorithms may potentially address the limitation of this approach. PH research using claims data would be considerably strengthened through the use of validated algorithms for cohort ascertainment.

Keywords: computable phenotype; machine learning; pulmonary hypertension.

PubMed Disclaimer

Conflict of interest statement

None.

Figures

Figure 1
Figure 1. Study overview.
EHR indicates electronic health record; PAH, pulmonary arterial hypertension; and PH, pulmonary hypertension.

Similar articles

Cited by

References

    1. Mathai SC, Mathew S. Breathing (and coding?) a bit easier: changes to international classification of disease coding for pulmonary hypertension. Chest. 2018;154:207–218. - PubMed
    1. Papani R, Sharma G, Agarwal A, Callahan SJ, Chan WJ, Kuo YF, Shim YM, Mihalek AD, Duarte AG. Validation of claims‐based algorithms for pulmonary arterial hypertension. Pulm Circ. 2018;8:2045894018759246. - PMC - PubMed
    1. Geva A, Gronsbell JL, Cai T, Murphy SN, Lyons JC, Heinz MM, Natter MD, Patibandia N, Bickel J, Mullen MP, et al. A computable phenotype improves cohort ascertainment in a pediatric pulmonary hypertension registry. J Pediatr. 2017;188:224–231. - PMC - PubMed
    1. Kirby JC, Speltz P, Rasmussen LV, Basford M, Gottesman O, Peissig PL, Pacheco JA, Tromp G, Pathak J, Carrell DS, et al. PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability. J Am Med Inform Assoc. 2016;23:1046–1052. - PMC - PubMed
    1. Shivade C, Raghavan P, Fosler‐Lussier E, Embi PJ, Elhadad N, Johnson SB, Lai AM. A review of approaches to identifying patient phenotype cohorts using electronic health records. J Am Med Inform Assoc. 2014;21:221–230. - PMC - PubMed

Publication types