. 2020 Oct 20;9(19):e016648.

doi: 10.1161/JAHA.120.016648. Epub 2020 Sep 29.

Claims-Based Algorithms for Identifying Patients With Pulmonary Hypertension: A Comparison of Decision Rules and Machine-Learning Approaches

Mei-Sing Ong^{1

2}, Jeffrey G Klann³, Kueiyu Joshua Lin⁴, Bradley A Maron⁵, Shawn N Murphy⁶, Marc D Natter^{2

7}, Kenneth D Mandl^{2

7

8}

Affiliations

¹ Department of Population Medicine Harvard Medical School & Harvard Pilgrim Health Care Institute Boston MA.
² Computational Health Informatics Program Boston Children's Hospital Boston MA.
³ Laboratory of Computer Science Massachusetts General Hospital Harvard Medical School Boston MA.
⁴ Division of Pharmacoepidemiology and Pharmacoeconomics Department of Medicine Brigham and Women's Hospital Harvard Medical School Boston MA.
⁵ Cardiovascular Division Department of Medicine Brigham and Women's Hospital Harvard Medical School Boston MA.
⁶ Department of Neurology Massachusetts General Hospital, Harvard Medical School Boston MA.
⁷ Department of Pediatrics Harvard Medical School Boston MA.
⁸ Department of Biomedical Informatics Harvard Medical School Boston MA.

PMID: 32990147
PMCID: PMC7792386
DOI: 10.1161/JAHA.120.016648

Claims-Based Algorithms for Identifying Patients With Pulmonary Hypertension: A Comparison of Decision Rules and Machine-Learning Approaches

Mei-Sing Ong et al. J Am Heart Assoc. 2020.

. 2020 Oct 20;9(19):e016648.

doi: 10.1161/JAHA.120.016648. Epub 2020 Sep 29.

Authors

Mei-Sing Ong^{1

2}, Jeffrey G Klann³, Kueiyu Joshua Lin⁴, Bradley A Maron⁵, Shawn N Murphy⁶, Marc D Natter^{2

7}, Kenneth D Mandl^{2

7

8}

Affiliations

¹ Department of Population Medicine Harvard Medical School & Harvard Pilgrim Health Care Institute Boston MA.
² Computational Health Informatics Program Boston Children's Hospital Boston MA.
³ Laboratory of Computer Science Massachusetts General Hospital Harvard Medical School Boston MA.
⁴ Division of Pharmacoepidemiology and Pharmacoeconomics Department of Medicine Brigham and Women's Hospital Harvard Medical School Boston MA.
⁵ Cardiovascular Division Department of Medicine Brigham and Women's Hospital Harvard Medical School Boston MA.
⁶ Department of Neurology Massachusetts General Hospital, Harvard Medical School Boston MA.
⁷ Department of Pediatrics Harvard Medical School Boston MA.
⁸ Department of Biomedical Informatics Harvard Medical School Boston MA.

PMID: 32990147
PMCID: PMC7792386
DOI: 10.1161/JAHA.120.016648

Abstract

Background Real-world healthcare data are an important resource for epidemiologic research. However, accurate identification of patient cohorts-a crucial first step underpinning the validity of research results-remains a challenge. We developed and evaluated claims-based case ascertainment algorithms for pulmonary hypertension (PH), comparing conventional decision rules with state-of-the-art machine-learning approaches. Methods and Results We analyzed an electronic health record-Medicare linked database from two large academic tertiary care hospitals (years 2007-2013). Electronic health record charts were reviewed to form a gold standard cohort of patients with (n=386) and without PH (n=164). Using health encounter data captured in Medicare claims (including patients' demographics, diagnoses, medications, and procedures), we developed and compared 2 approaches for identifying patients with PH: decision rules and machine-learning algorithms using penalized lasso regression, random forest, and gradient boosting machine. The most optimal rule-based algorithm-having ≥3 PH-related healthcare encounters and having undergone right heart catheterization-attained an area under the receiver operating characteristic curve of 0.64 (sensitivity, 0.75; specificity, 0.48). All 3 machine-learning algorithms outperformed the most optimal rule-based algorithm (P<0.001). A model derived from the random forest algorithm achieved an area under the receiver operating characteristic curve of 0.88 (sensitivity, 0.87; specificity, 0.70), and gradient boosting machine achieved comparable results (area under the receiver operating characteristic curve, 0.85; sensitivity, 0.87; specificity, 0.70). Penalized lasso regression achieved an area under the receiver operating characteristic curve of 0.73 (sensitivity, 0.70; specificity, 0.68). Conclusions Research-grade case identification algorithms for PH can be derived and rigorously validated using machine-learning algorithms. Simple decision rules commonly applied in published literature performed poorly; more complex rule-based algorithms may potentially address the limitation of this approach. PH research using claims data would be considerably strengthened through the use of validated algorithms for cohort ascertainment.

Keywords: computable phenotype; machine learning; pulmonary hypertension.

PubMed Disclaimer

Conflict of interest statement

None.

Figures

**Figure 1. Study overview.**
EHR indicates electronic health record; PAH, pulmonary arterial hypertension; and PH, pulmonary hypertension.

See this image and copyright information in PMC

Cited by

Development and validation of algorithms to predict left ventricular ejection fraction class from healthcare claims data.
Logeart D, Doublet M, Gouysse M, Damy T, Isnard R, Roubille F. Logeart D, et al. ESC Heart Fail. 2024 Jun;11(3):1688-1697. doi: 10.1002/ehf2.14725. Epub 2024 Mar 4. ESC Heart Fail. 2024. PMID: 38438250 Free PMC article.
A Computable Phenotype Algorithm for Postvaccination Myocarditis/Pericarditis Detection Using Real-World Data: Validation Study.
Deady M, Duncan R, Sonesen M, Estiandan R, Stimpert K, Cho S, Beers J, Goodness B, Jones LD, Forshee R, Anderson SA, Ezzeldin H. Deady M, et al. J Med Internet Res. 2024 Nov 25;26:e54597. doi: 10.2196/54597. J Med Internet Res. 2024. PMID: 39586081 Free PMC article.
Assessing the precision of machine learning for diagnosing pulmonary arterial hypertension: a systematic review and meta-analysis of diagnostic accuracy studies.
Fadilah A, Putri VYS, Puling IMDR, Willyanto SE. Fadilah A, et al. Front Cardiovasc Med. 2024 Aug 27;11:1422327. doi: 10.3389/fcvm.2024.1422327. eCollection 2024. Front Cardiovasc Med. 2024. PMID: 39257851 Free PMC article.
Applying Machine Learning in Distributed Data Networks for Pharmacoepidemiologic and Pharmacovigilance Studies: Opportunities, Challenges, and Considerations.
Wong J, Prieto-Alhambra D, Rijnbeek PR, Desai RJ, Reps JM, Toh S. Wong J, et al. Drug Saf. 2022 May;45(5):493-510. doi: 10.1007/s40264-022-01158-3. Epub 2022 May 17. Drug Saf. 2022. PMID: 35579813 Free PMC article. Review.
Development of Interoperable Computable Phenotype Algorithms for Adverse Events of Special Interest to Be Used for Biologics Safety Surveillance: Validation Study.
Holdefer AA, Pizarro J, Saunders-Hastings P, Beers J, Sang A, Hettinger AZ, Blumenthal J, Martinez E, Jones LD, Deady M, Ezzeldin H, Anderson SA. Holdefer AA, et al. JMIR Public Health Surveill. 2024 Jul 15;10:e49811. doi: 10.2196/49811. JMIR Public Health Surveill. 2024. PMID: 39008361 Free PMC article.

See all "Cited by" articles

References

1. Mathai SC, Mathew S. Breathing (and coding?) a bit easier: changes to international classification of disease coding for pulmonary hypertension. Chest. 2018;154:207–218. - PubMed
1. Papani R, Sharma G, Agarwal A, Callahan SJ, Chan WJ, Kuo YF, Shim YM, Mihalek AD, Duarte AG. Validation of claims‐based algorithms for pulmonary arterial hypertension. Pulm Circ. 2018;8:2045894018759246. - PMC - PubMed
1. Geva A, Gronsbell JL, Cai T, Murphy SN, Lyons JC, Heinz MM, Natter MD, Patibandia N, Bickel J, Mullen MP, et al. A computable phenotype improves cohort ascertainment in a pediatric pulmonary hypertension registry. J Pediatr. 2017;188:224–231. - PMC - PubMed
1. Kirby JC, Speltz P, Rasmussen LV, Basford M, Gottesman O, Peissig PL, Pacheco JA, Tromp G, Pathak J, Carrell DS, et al. PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability. J Am Med Inform Assoc. 2016;23:1046–1052. - PMC - PubMed
1. Shivade C, Raghavan P, Fosler‐Lussier E, Embi PJ, Elhadad N, Johnson SB, Lai AM. A review of approaches to identifying patient phenotype cohorts using electronic health records. J Am Med Inform Assoc. 2014;21:221–230. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Claims-Based Algorithms for Identifying Patients With Pulmonary Hypertension: A Comparison of Decision Rules and Machine-Learning Approaches

Affiliations

Claims-Based Algorithms for Identifying Patients With Pulmonary Hypertension: A Comparison of Decision Rules and Machine-Learning Approaches

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Medical