Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Multicenter Study
. 2021 Nov 1;6(11):1285-1295.
doi: 10.1001/jamacardio.2021.2746.

Performance of a Convolutional Neural Network and Explainability Technique for 12-Lead Electrocardiogram Interpretation

Affiliations
Multicenter Study

Performance of a Convolutional Neural Network and Explainability Technique for 12-Lead Electrocardiogram Interpretation

J Weston Hughes et al. JAMA Cardiol. .

Abstract

Importance: Millions of clinicians rely daily on automated preliminary electrocardiogram (ECG) interpretation. Critical comparisons of machine learning-based automated analysis against clinically accepted standards of care are lacking.

Objective: To use readily available 12-lead ECG data to train and apply an explainability technique to a convolutional neural network (CNN) that achieves high performance against clinical standards of care.

Design, setting, and participants: This cross-sectional study was conducted using data from January 1, 2003, to December 31, 2018. Data were obtained in a commonly available 12-lead ECG format from a single-center tertiary care institution. All patients aged 18 years or older who received ECGs at the University of California, San Francisco, were included, yielding a total of 365 009 patients. Data were analyzed from January 1, 2019, to March 2, 2021.

Exposures: A CNN was trained to predict the presence of 38 diagnostic classes in 5 categories from 12-lead ECG data. A CNN explainability technique called LIME (Linear Interpretable Model-Agnostic Explanations) was used to visualize ECG segments contributing to CNN diagnoses.

Main outcomes and measures: Area under the receiver operating characteristic curve (AUC), sensitivity, and specificity were calculated for the CNN in the holdout test data set against cardiologist clinical diagnoses. For a second validation, 3 electrophysiologists provided consensus committee diagnoses against which the CNN, cardiologist clinical diagnosis, and MUSE (GE Healthcare) automated analysis performance was compared using the F1 score; AUC, sensitivity, and specificity were also calculated for the CNN against the consensus committee.

Results: A total of 992 748 ECGs from 365 009 adult patients (mean [SD] age, 56.2 [17.6] years; 183 600 women [50.3%]; and 175 277 White patients [48.0%]) were included in the analysis. In 91 440 test data set ECGs, the CNN demonstrated an AUC of at least 0.960 for 32 of 38 classes (84.2%). Against the consensus committee diagnoses, the CNN had higher frequency-weighted mean F1 scores than both cardiologists and MUSE in all 5 categories (CNN frequency-weighted F1 score for rhythm, 0.812; conduction, 0.729; chamber diagnosis, 0.598; infarct, 0.674; and other diagnosis, 0.875). For 32 of 38 classes (84.2%), the CNN had AUCs of at least 0.910 and demonstrated comparable F1 scores and higher sensitivity than cardiologists, except for atrial fibrillation (CNN F1 score, 0.847 vs cardiologist F1 score, 0.881), junctional rhythm (0.526 vs 0.727), premature ventricular complex (0.786 vs 0.800), and Wolff-Parkinson-White (0.800 vs 0.842). Compared with MUSE, the CNN had higher F1 scores for all classes except supraventricular tachycardia (CNN F1 score, 0.696 vs MUSE F1 score, 0.714). The LIME technique highlighted physiologically relevant ECG segments.

Conclusions and relevance: The results of this cross-sectional study suggest that readily available ECG data can be used to train a CNN algorithm to achieve comparable performance to clinical cardiologists and exceed the performance of MUSE automated analysis for most diagnoses, with some exceptions. The LIME explainability technique applied to CNNs highlights physiologically relevant ECG segments that contribute to the CNN's diagnoses.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest Disclosures: Dr Tison reported receiving research grants from General Electric, Janssen Pharmaceuticals, and Myokardia; personal fees from Myokardia Digital Health as an advisory group member; and being an unpaid advisor for Cardiogram. Dr Olgin reported receiving research funding from the National Institutes of Health, Samsung, and iBeat. Dr Avram reported receiving grants from Fonds de recherche en Santé du Québec and personal fees from Novartis Canada outside the submitted work. Dr Sittler reported being a cofounder and shareholder at Color Health. Dr Gonzalez reported receiving grants from the National Science Foundation in the form of the RISE Expedition Award during the conduct of the study. No other disclosures were reported.

Figures

Figure 1.
Figure 1.. Diagram of Study Electrocardiogram (ECG) Data Sets
aThe sampled training data set was randomly sampled from the training data set (eMethods in the Supplement) to address class imbalance. Consensus committee data set individuals were not in other data sets. Blue boxes indicate data sets used for training; yellow boxes indicate data sets used for validation. UCSF indicates University of California, San Francisco.
Figure 2.
Figure 2.. Co-occurrence Matrices, Frequency-Weighted Mean F1 Scores, and Sensitivities for the Convolutional Neural Network (CNN)
Co-occurrence matrices for both (A) cardiologist-confirmed and (B) CNN-predicted rhythm diagnoses. Counts of co-occurrence diagnoses pairs are shown, with totals on the diagonal. C, Mean F1 scores vs the committee consensus diagnosis. D, Mean sensitivity vs the committee consensus diagnosis. AUC indicates frequency-weighted area under the receiver operating characteristic curve; cardiologist dx, cardiologist clinical diagnosis; MUSE, electrocardiogram interpretation database management system by GE Healthcare; NA, not available. aF1 scores averaged by class frequencies. bSpecificity is fixed at the frequency-weighted average cardiologist clinical diagnosis specificity for each class; sensitivities reported at this fixed specificity. MUSE sensitivity/specificity are unalterable and therefore are reported in eTable 3 in the Supplement. cSensitivity averaged by class frequencies.
Figure 3.
Figure 3.. Examples of an Artificial Intelligence Explainability Technique Applied to Electrocardiograms (ECGs)
The Linear Interpretable Model-Agnostic Explanations (LIME) explainability technique highlights ECG segments important to the convolutional neural network (CNN) for each diagnosis. Segments of greater importance are shown in greater color intensity. For each example, all leads with LIME-highlighted segments are shown, as is the CNN’s confidence score. Many physiologically associated ECG features were highlighted: in Wolff-Parkinson-White, the QRS “delta-wave” of preexcitation; in right ventricular hypertrophy, the R-prime in V1; and in inferior infarcts, the Q-wave in the inferior leads III and aVF. For the unipolar limb leads (H), a indicates augmented; F, foot; L, left arm; R, right arm; and V, vector.

Comment in

Similar articles

Cited by

References

    1. Hongo RH, Goldschlager N. Status of computerized electrocardiography. Cardiol Clin. 2006;24(3):491-504, x. doi:10.1016/j.ccl.2006.03.005 - DOI - PubMed
    1. Schläpfer J, Wellens HJ. Computer-interpreted electrocardiograms: benefits and limitations. J Am Coll Cardiol. 2017;70(9):1183-1192. doi:10.1016/j.jacc.2017.07.723 - DOI - PubMed
    1. Blackburn H, Keys A, Simonson E, Rautaharju P, Punsar S. The electrocardiogram in population studies. a classification system. Circulation. 1960;21(June):1160-1175. doi:10.1161/01.CIR.21.6.1160 - DOI - PubMed
    1. Hannun AY, Rajpurkar P, Haghpanahi M, et al. . Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nat Med. 2019;25(1):65-69. doi:10.1038/s41591-018-0268-3 - DOI - PMC - PubMed
    1. Ribeiro AH, Ribeiro MH, Paixão GMM, et al. . Automatic diagnosis of the 12-lead ECG using a deep neural network. Nat Commun. 2020;11(1):1760. doi:10.1038/s41467-020-15432-4 - DOI - PMC - PubMed

Publication types