Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May 15;8(3):ooaf035.
doi: 10.1093/jamiaopen/ooaf035. eCollection 2025 Jun.

A phenotyping algorithm for classification of single ventricle physiology using electronic health records

Affiliations

A phenotyping algorithm for classification of single ventricle physiology using electronic health records

Hang Xu et al. JAMIA Open. .

Abstract

Objectives: Congenital heart disease (CHD) patients with single ventricle physiology (SVP) have heterogeneous characteristics that challenge cohort classification. We aim to develop a phenotyping algorithm that accurately identifies SVP patients using electronic health record (EHR) data.

Materials and methods: We used ICD-9 and ICD-10 codes for initial classification, then enhanced the algorithm with domain expertise, imaging reports, and progress notes. The algorithm was developed using a cohort of 1020 patients who underwent magnetic resonance imaging scans and tested in a separate cohort of 2500 CHD patients with adjudication. Validation was performed in a holdout group of 22 500 CHD patients. We evaluated performance using accuracy, sensitivity, precision, and F1 score, and compared it to a published algorithm for SVP using the same dataset.

Results: In the 2500-testing cohort, our algorithm based on specialty-defined features and International Classification of Diseases (ICD) codes achieved 99.24% accuracy, 94.12% precision, 85.11% sensitivity, and 89.39% F1 score. In contrast, the published method achieved 95.20% accuracy, 43.23% precision, 88.30% sensitivity, and 58.04% F1 score. In the 22 500-validation cohort, our algorithm achieved 93.82% precision, while the published method achieved 43.00%.

Discussion and conclusions: Our automated phenotype algorithm, combined with physician adjudication, outperforms a published method for SVP classification. It effectively identifies false positives by cross-referencing clinical notes and detects missed SVP cases that were due to absent or erroneous ICD codes. Our integrated phenotyping algorithm showed excellent performance and has the potential to improve research and clinical care of SVP patients through the automated development of an electronic cohort for prognostication, monitoring, and management.

Keywords: cohort development; electronic health records; phenotype algorithm; single ventricle physiology.

PubMed Disclaimer

Conflict of interest statement

All authors declare no competing interests relevant to this study.

Figures

Figure 1.
Figure 1.
Summary of datasets used for algorithm development, testing, and validation. Each blue person tag represents 1000 patients. Abbreviations: CHD, congenital heart diseases; EHR, electronic health record.
Figure 2.
Figure 2.
Phenotyping algorithm development. Components from the electronic health record (EHR) used for development of phenotyping algorithm are shown. The blue boxes outline the process from EHR data to SVP classification using the rule-based phenotype algorithm. The yellow boxes highlight the outputs of the SVP classification process. The pink boxes show excluded patients and those classified as non-SVP patients. The green boxes represent the SVP patients and their respective subtypes. Abbreviations: CT, computed tomography; ECHO, echocardiography; ICD, International Classification of Diseases; LVEDVi, left ventricular end-diastolic volume index (LVEDVi=LVEDV divided by body surface area); MRI, magnetic resonance imaging; SVP, single ventricle physiology.
Figure 3.
Figure 3.
Cohort demographics and summary of data distribution for the algorithm (A) development cohort and (B) testing/validation cohort. The development cohort consisted of n = 1020 patients who had undergone ferumoxytol-enhanced MRI. The testing/validation cohort consisted of 25 000 random patients with congenital heart disease (CHD). The 25 000 CHD patient cohort generated a total of 231 838 clinical notes including echo/MRI/CT reports, and cardiology consultation notes. Distribution of patients by ethnicity, gender, age, and race are shown. The percentage of records containing vital signs, labs, procedures, and encounter diagnosis, as well as percentage of notes derived from echo/MRI/CT reports, and cardiology consultation notes are summarized.
Figure 4.
Figure 4.
Summary of feature extraction used in algorithm development. The electronic health records consisted of both structured and unstructured data. Features and keywords were extracted for the development of the phenotype algorithm.
Figure 5.
Figure 5.
Phenotype algorithm development cohort and their performance metrics (n = 1020). (A) Distribution of SVP patients categorized by the 3 main types of SVP diagnosis (double inlet left ventricle, hypoplastic left heart syndrome, and tricuspid atresia). (B) Overlap among the 3 main types of SVP diagnosis. (C) Confusion matrix and (D) performance metrics for the published ICD-based classification method using the development cohort. (E) The Beeswarm plot summarizes the contribution of the top 10 features in the development data set, ranked by the mean absolute SHAP values, that significantly influenced the phenotyping algorithm. (F) Comparison between the total number of patients, actual SVP patients, and SVP patients identified by the phenotyping algorithm across Chart 1, Chart 2, and other ICD code groups. (G) Confusion matrix illustrating the performance of the proposed phenotyping algorithm. Abbreviations: CT, computed tomography; ECHO, echocardiography; ICD, International Classification of Diseases; LVEDVi, left ventricular end-diastolic volume index; MRI, magnetic resonance imaging.
Figure 6.
Figure 6.
Test performance of the proposed phenotyping algorithm. (A) Cohort demographic distribution. (B) Percentage of records with vital signs, labs, procedures, and encounter diagnosis. (C) Percentage of clinical notes including echo reports, cardiology consultation notes, CT reports, and MRI reports. (D) Summary of the total number of SVP patients from Encounter Diagnosis confirmed SVP patients, and those classified by the phenotyping algorithm across Chart 1, Chart 2, and other groups. (E) Confusion matrix demonstrating the performance of the phenotyping algorithm in n = 2500 CHD patients. (F) Confusion matrix demonstrating the performance of the published ICD-based classification method in the same cohort. (G) Performance evaluation metrics for the phenotyping algorithm in both the development cohort (n = 1020) and the test cohort (n = 2500 CHD patients) cohort relative to the performance of the published ICD-based method.

Similar articles

References

    1. Lee SM, Kwon JE, Song SH, et al. Prenatal prediction of neonatal death in single ventricle congenital heart disease. Prenat Diagn. 2016;36:346-352. 10.1002/pd.4787 - DOI - PubMed
    1. Gilboa SM, Devine OJ, Kucik JE, et al. Congenital heart defects in the United States. Circulation. 2016;134:101-109. 10.1161/circulationaha.115.019307 - DOI - PMC - PubMed
    1. Rusin CG, Acosta SI, Vu EL, Ahmed M, Brady KM, Penny DJ. Automated prediction of cardiorespiratory deterioration in patients with single ventricle. J Am Coll Cardiol. 2021;77:3184-3192. 10.1016/j.jacc.2021.04.072 - DOI - PMC - PubMed
    1. Barron DJ, Kilby MD, Davies B, Wright JG, Jones TJ, Brawn WJ. Hypoplastic left heart syndrome. Lancet. 2009;374:551-564. 10.1016/s0140-6736(09)60563-8 - DOI - PubMed
    1. Rao PS. Single ventricle—a comprehensive review. Children. 2021;8:441. 10.3390/children8060441 - DOI - PMC - PubMed