Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Nov;9(6):649-658.
doi: 10.1161/CIRCOUTCOMES.116.002797. Epub 2016 Nov 8.

Early Detection of Heart Failure Using Electronic Health Records: Practical Implications for Time Before Diagnosis, Data Diversity, Data Quantity, and Data Density

Affiliations

Early Detection of Heart Failure Using Electronic Health Records: Practical Implications for Time Before Diagnosis, Data Diversity, Data Quantity, and Data Density

Kenney Ng et al. Circ Cardiovasc Qual Outcomes. 2016 Nov.

Abstract

Background: Using electronic health records data to predict events and onset of diseases is increasingly common. Relatively little is known, although, about the tradeoffs between data requirements and model utility.

Methods and results: We examined the performance of machine learning models trained to detect prediagnostic heart failure in primary care patients using longitudinal electronic health records data. Model performance was assessed in relation to data requirements defined by the prediction window length (time before clinical diagnosis), the observation window length (duration of observation before prediction window), the number of different data domains (data diversity), the number of patient records in the training data set (data quantity), and the density of patient encounters (data density). A total of 1684 incident heart failure cases and 13 525 sex, age-category, and clinic matched controls were used for modeling. Model performance improved as (1) the prediction window length decreases, especially when <2 years; (2) the observation window length increases but then levels off after 2 years; (3) the training data set size increases but then levels off after 4000 patients; (4) more diverse data types are used, but, in order, the combination of diagnosis, medication order, and hospitalization data was most important; and (5) data were confined to patients who had ≥10 phone or face-to-face encounters in 2 years.

Conclusions: These empirical findings suggest possible guidelines for the minimum amount and type of data needed to train effective disease onset predictive models using longitudinal electronic health records data.

Keywords: diagnosis; electronic health records; heart failure; prevention and control; risk factors.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Relation of prediction window, observation window for use of data and the index and diagnosis dates for cases and the same relative times for controls.
Figure 2
Figure 2
Prediction performance for ungrouped versus grouped representations for the diagnoses, medications and hospitalizations data types.
Figure 3
Figure 3
Prediction performance for (A) individual and (B) combined data types.
Figure 4
Figure 4
Prediction performance for different prediction window lengths.
Figure 5
Figure 5
Prediction performance for different observation window lengths.
Figure 6
Figure 6
Prediction performance as a function of training set size.
Figure 7
Figure 7
Distribution of the number of encounters in the 2 year observation window for cases and controls.
Figure 8
Figure 8
Prediction performance as a function of data density and data size.

Comment in

References

    1. Mozaffarian D, Benjamin E, Go A, Arnett D, Blaha M, Cushman M, Das S, deFerranti S, Després J-P, Fullerton H, Howard V, Huffman M, Isasi C, Jiménez M, Judd S, Kissela B, Lichtman J, Lisabeth L, Liu S, Mackey R, Magid D, McGuire D, Mohler E, Moy C, Muntner P, Mussolino M, Nasir K, Neumar R, Nichol G, Palaniappan L, Pandey D, Reeves M, Rodriguez C, Rosamond W, Sorlie P, Stein J, Towfighi A, Turan T, Virani S, Woo D, Yeh R, Turner M. Heart Disease and Stroke Statistics—2016 Update: A Report From the American Heart Association. Circulation. 2015;133:38–360. - PubMed
    1. Roger V, Weston S, Redfield M, Hellermann-Homan J, Killian J, Yawn B, Jacobsen S. Trends in heart failure incidence and survival in a community-based population. JAMA. 2014;292:344–350. - PubMed
    1. Murphy S, Xu J, Kochanek K. Deaths: final data for 2010. Natl Vital Stat Rep. 2013;61:1–117. - PubMed
    1. Wang Y, Ng K, Byrd R, Hu J, Ebadollahi S, Daar Z, deFilippi C, Steinhubl S, Stewart W. Early detection of heart failure with varying prediction windows by structured and unstructured data in electronic health records; Conf Proc IEEE Eng Med Biol Soc., EMBC.; 2015.pp. 2530–2533. - PMC - PubMed
    1. Vijayakrishnan R, Steinhubl S, Ng K, Sun J, Byrd R, Daar Z, Williams B, deFilippi C, Ebadollahi S, Stewart W. Prevalence of heart failure signs and symptoms in a large primary care population identified through the use of text and data mining of the electronic health record. J. Card. Fail. 2015;20:459–464. - PMC - PubMed

Publication types