Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 May 25;7(1):154.
doi: 10.1038/s41597-020-0495-6.

PTB-XL, a large publicly available electrocardiography dataset

Affiliations

PTB-XL, a large publicly available electrocardiography dataset

Patrick Wagner et al. Sci Data. .

Abstract

Electrocardiography (ECG) is a key non-invasive diagnostic tool for cardiovascular diseases which is increasingly supported by algorithms based on machine learning. Major obstacles for the development of automatic ECG interpretation algorithms are both the lack of public datasets and well-defined benchmarking procedures to allow comparison s of different algorithms. To address these issues, we put forward PTB-XL, the to-date largest freely accessible clinical 12-lead ECG-waveform dataset comprising 21837 records from 18885 patients of 10 seconds length. The ECG-waveform data was annotated by up to two cardiologists as a multi-label dataset, where diagnostic labels were further aggregated into super and subclasses. The dataset covers a broad range of diagnostic classes including, in particular, a large fraction of healthy records. The combination with additional metadata on demographics, additional diagnostic statements, diagnosis likelihoods, manually annotated signal properties as well as suggested folds for splitting training and test sets turns the dataset into a rich resource for the development and the evaluation of automatic ECG interpretation algorithms.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Graphical summary of the PTB-XL dataset in terms of diagnostic superclasses and subclasses, see Table 5 for a definition of the used acronyms.
Fig. 2
Fig. 2
Overview of populated columns in ptbxl_database.csv. Each entry corresponds to a row in the table in temporal order from top to bottom. Black pixels indicate existing values, missing values remain white.
Fig. 3
Fig. 3
Demographic overview of patients in PTB-XL.
Fig. 4
Fig. 4
Venn Diagram illustrating the assignment of the given SCP ECG statements to the three categories diagnostic, form and rhythm.
Fig. 5
Fig. 5
Distribution of diagnostic subclasses for given diagnostic superclasses.
Fig. 6
Fig. 6
Distribution of ECG statements, sex and age across ten folds with stratified folds. The ninth and tenth fold are folds with a particularly high label quality that are supposed to be used as validation and test sets.
Fig. 7
Fig. 7
Example Python code for loading data and labels also using the suggested folds and aggregation of diagnostic labels.

References

    1. Dagenais, G. R. et al. Variations in common diseases, hospital admissions, and deaths in middle-aged adults in 21 countries from five continents (PURE): a prospective cohort study. The Lancet (2019). - PubMed
    1. Hannun AY, et al. Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nature Medicine. 2019;25:65–69. doi: 10.1038/s41591-018-0268-3. - DOI - PMC - PubMed
    1. Attia ZI, et al. An artificial intelligence-enabled ECG algorithm for the identification of patients with atrial fibrillation during sinus rhythm: a retrospective analysis of outcome prediction. The Lancet. 2019;394:861–867. doi: 10.1016/S0140-6736(19)31721-0. - DOI - PubMed
    1. Schläpfer J, Wellens HJ. Computer-Interpreted Electrocardiograms. Journal of the American College of Cardiology. 2017;70:1183–1192. doi: 10.1016/j.jacc.2017.07.723. - DOI - PubMed
    1. Wagner P, Strodthoff N, Bousseljot R, Samek W, Schaeffter T. 2020. PTB-XL, a large publicly available electrocardiography dataset. PhysioNet. - DOI - PMC - PubMed

Publication types