Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Feb;117(2):e2440.
doi: 10.1002/bdr2.2440.

A Generalized Machine Learning Model for Identifying Congenital Heart Defects (CHDs) Using ICD Codes

Affiliations

A Generalized Machine Learning Model for Identifying Congenital Heart Defects (CHDs) Using ICD Codes

Haoming Shi et al. Birth Defects Res. 2025 Feb.

Abstract

Background: International Classification of Diseases (ICD) codes utilized for congenital heart defect (CHD) case identification in datasets have substantial false-positive (FP) rates. Incorporating machine learning (ML) algorithms following case selection by ICD codes may improve the accuracy of CHD identification, enhancing surveillance efforts.

Methods: Traditional ML methods were applied to four encounter-level datasets, 2010-2019, for 3334 patients with validated diagnoses and with at least one CHD ICD code identified. A 5-fold cross-validation approach was applied to the dataset to determine the set of overlapping important features best classifying CHD cases. Training and testing combinations were explored to determine the approach yielding the most accurate CHD classification.

Results: CHD ICD positive predictive values (PPVs) by site ranged from 53.2% to 84.0%. The ML algorithm achieved a PPV of 95% (1273/1340) for the four-site dataset with a false-negative (FN) rate of 33% (639/1912) by choosing an operating point prioritizing PPV from the PPV-FN rate curve. XGBoost reduced 2105 Clinical Classification Software (CCS) features to 137 that identified those with true-positive (TP) CHD and false-positive FP classification.

Conclusion: Applying ML algorithms following case selection by CHD-related ICD codes improved the accuracy of identifying TP true-positive CHD cases.

Keywords: congenital heart disease; machine learning; population health.

PubMed Disclaimer

Conflict of interest statement

Disclosures

The authors have no conflicts to declare.

References

    1. Agarwal S, Sud K, & Menon V (2016). Nationwide Hospitalization Trends in Adult Congenital Heart Disease Across 2003–2012. Journal of the American Heart Association, 5(1), e002330. 10.1161/JAHA.115.002330 - DOI - PMC - PubMed
    1. Almli LM, Alter CC, Russell RB, Tinker SC, Howards PP, Cragan J, Petersen E, Carrino GE, & Reefhuis J (2017). Association Between Infant Mortality Attributable to Birth Defects and Payment Source for Delivery—United States, 2011–2013. Morbidity and Mortality Weekly Report, 66(3), 84–87. 10.15585/mmwr.mm6603a4 - DOI - PMC - PubMed
    1. Bhatt AB, Foster E, Kuehl K, Alpert J, Brabeck S, Crumb S, Davidson WR, Earing MG, Ghoshhajra BB, Karamlou T, Mital S, Ting J, & Tseng ZH (2015). Congenital Heart Disease in the Older Adult: A Scientific Statement From the American Heart Association. Circulation, 131(21), 1884–1931. 10.1161/CIR.0000000000000204 - DOI - PubMed
    1. Billett J, Cowie MR, Gatzoulis MA, Vonder Muhll IF, & Majeed A (2008). Comorbidity, healthcare utilisation and process of care measures in patients with congenital heart disease in the UK: Cross-sectional, population-based study with case-control analysis. Heart (British Cardiac Society), 94(9), 1194–1199. 10.1136/hrt.2007.122671 - DOI - PubMed
    1. Brida M, & Gatzoulis MA (2019). Adult congenital heart disease: Past, present and future. Acta Paediatrica, 108(10), 1757–1764. 10.1111/apa.14921 - DOI - PubMed

LinkOut - more resources