Claims-Based Algorithms for Identifying Patients With Pulmonary Hypertension: A Comparison of Decision Rules and Machine-Learning Approaches
- PMID: 32990147
- PMCID: PMC7792386
- DOI: 10.1161/JAHA.120.016648
Claims-Based Algorithms for Identifying Patients With Pulmonary Hypertension: A Comparison of Decision Rules and Machine-Learning Approaches
Abstract
Background Real-world healthcare data are an important resource for epidemiologic research. However, accurate identification of patient cohorts-a crucial first step underpinning the validity of research results-remains a challenge. We developed and evaluated claims-based case ascertainment algorithms for pulmonary hypertension (PH), comparing conventional decision rules with state-of-the-art machine-learning approaches. Methods and Results We analyzed an electronic health record-Medicare linked database from two large academic tertiary care hospitals (years 2007-2013). Electronic health record charts were reviewed to form a gold standard cohort of patients with (n=386) and without PH (n=164). Using health encounter data captured in Medicare claims (including patients' demographics, diagnoses, medications, and procedures), we developed and compared 2 approaches for identifying patients with PH: decision rules and machine-learning algorithms using penalized lasso regression, random forest, and gradient boosting machine. The most optimal rule-based algorithm-having ≥3 PH-related healthcare encounters and having undergone right heart catheterization-attained an area under the receiver operating characteristic curve of 0.64 (sensitivity, 0.75; specificity, 0.48). All 3 machine-learning algorithms outperformed the most optimal rule-based algorithm (P<0.001). A model derived from the random forest algorithm achieved an area under the receiver operating characteristic curve of 0.88 (sensitivity, 0.87; specificity, 0.70), and gradient boosting machine achieved comparable results (area under the receiver operating characteristic curve, 0.85; sensitivity, 0.87; specificity, 0.70). Penalized lasso regression achieved an area under the receiver operating characteristic curve of 0.73 (sensitivity, 0.70; specificity, 0.68). Conclusions Research-grade case identification algorithms for PH can be derived and rigorously validated using machine-learning algorithms. Simple decision rules commonly applied in published literature performed poorly; more complex rule-based algorithms may potentially address the limitation of this approach. PH research using claims data would be considerably strengthened through the use of validated algorithms for cohort ascertainment.
Keywords: computable phenotype; machine learning; pulmonary hypertension.
Conflict of interest statement
None.
Figures
Similar articles
-
Development and validation of prediction models for stroke and myocardial infarction in type 2 diabetes based on health insurance claims: does machine learning outperform traditional regression approaches?Cardiovasc Diabetol. 2025 Feb 18;24(1):80. doi: 10.1186/s12933-025-02640-9. Cardiovasc Diabetol. 2025. PMID: 39966813 Free PMC article.
-
Electronic phenotyping of health outcomes of interest using a linked claims-electronic health record database: Findings from a machine learning pilot project.J Am Med Inform Assoc. 2021 Jul 14;28(7):1507-1517. doi: 10.1093/jamia/ocab036. J Am Med Inform Assoc. 2021. PMID: 33712852 Free PMC article.
-
A machine learning approach to identifying patients with pulmonary hypertension using real-world electronic health records.Int J Cardiol. 2023 Mar 1;374:95-99. doi: 10.1016/j.ijcard.2022.12.016. Epub 2022 Dec 14. Int J Cardiol. 2023. PMID: 36528138
-
Machine learning applications to clinical decision support in neurosurgery: an artificial intelligence augmented systematic review.Neurosurg Rev. 2020 Oct;43(5):1235-1253. doi: 10.1007/s10143-019-01163-8. Epub 2019 Aug 17. Neurosurg Rev. 2020. PMID: 31422572
-
Machine Learning for Health Services Researchers.Value Health. 2019 Jul;22(7):808-815. doi: 10.1016/j.jval.2019.02.012. Value Health. 2019. PMID: 31277828 Review.
Cited by
-
Development and validation of algorithms to predict left ventricular ejection fraction class from healthcare claims data.ESC Heart Fail. 2024 Jun;11(3):1688-1697. doi: 10.1002/ehf2.14725. Epub 2024 Mar 4. ESC Heart Fail. 2024. PMID: 38438250 Free PMC article.
-
A Computable Phenotype Algorithm for Postvaccination Myocarditis/Pericarditis Detection Using Real-World Data: Validation Study.J Med Internet Res. 2024 Nov 25;26:e54597. doi: 10.2196/54597. J Med Internet Res. 2024. PMID: 39586081 Free PMC article.
-
Assessing the precision of machine learning for diagnosing pulmonary arterial hypertension: a systematic review and meta-analysis of diagnostic accuracy studies.Front Cardiovasc Med. 2024 Aug 27;11:1422327. doi: 10.3389/fcvm.2024.1422327. eCollection 2024. Front Cardiovasc Med. 2024. PMID: 39257851 Free PMC article.
-
Applying Machine Learning in Distributed Data Networks for Pharmacoepidemiologic and Pharmacovigilance Studies: Opportunities, Challenges, and Considerations.Drug Saf. 2022 May;45(5):493-510. doi: 10.1007/s40264-022-01158-3. Epub 2022 May 17. Drug Saf. 2022. PMID: 35579813 Free PMC article. Review.
-
Development of Interoperable Computable Phenotype Algorithms for Adverse Events of Special Interest to Be Used for Biologics Safety Surveillance: Validation Study.JMIR Public Health Surveill. 2024 Jul 15;10:e49811. doi: 10.2196/49811. JMIR Public Health Surveill. 2024. PMID: 39008361 Free PMC article.
References
-
- Mathai SC, Mathew S. Breathing (and coding?) a bit easier: changes to international classification of disease coding for pulmonary hypertension. Chest. 2018;154:207–218. - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Medical