Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012:2012:901-10.
Epub 2012 Nov 3.

Combining knowledge and data driven insights for identifying risk factors using electronic health records

Affiliations

Combining knowledge and data driven insights for identifying risk factors using electronic health records

Jimeng Sun et al. AMIA Annu Symp Proc. 2012.

Abstract

Background: The ability to identify the risk factors related to an adverse condition, e.g., heart failures (HF) diagnosis, is very important for improving care quality and reducing cost. Existing approaches for risk factor identification are either knowledge driven (from guidelines or literatures) or data driven (from observational data). No existing method provides a model to effectively combine expert knowledge with data driven insight for risk factor identification.

Methods: We present a systematic approach to enhance known knowledge-based risk factors with additional potential risk factors derived from data. The core of our approach is a sparse regression model with regularization terms that correspond to both knowledge and data driven risk factors.

Results: The approach is validated using a large dataset containing 4,644 heart failure cases and 45,981 controls. The outpatient electronic health records (EHRs) for these patients include diagnosis, medication, lab results from 2003-2010. We demonstrate that the proposed method can identify complementary risk factors that are not in the existing known factors and can better predict the onset of HF. We quantitatively compare different sets of risk factors in the context of predicting onset of HF using the performance metric, the Area Under the ROC Curve (AUC). The combined risk factors between knowledge and data significantly outperform knowledge-based risk factors alone. Furthermore, those additional risk factors are confirmed to be clinically meaningful by a cardiologist.

Conclusion: We present a systematic framework for combining knowledge and data driven insights for risk factor identification. We demonstrate the power of this framework in the context of predicting onset of HF, where our approach can successfully identify intuitive and predictive risk factors beyond a set of known HF risk factors.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
System Overview for Risk Factor Combination
Figure 2:
Figure 2:
AUC significantly improves as complementary data driven risk factors are added into existing knowledge based risk factors. A significant AUC increase occurs when we add first 50 data driven features.

References

    1. Curtis LH, Whellan DJMHS, Hammill BG, et al. Incidence and Prevalence of Heart Failure in Elderly Persons, 1994–2003. Arch Intern Med. 2008;168:418–424. - PubMed
    1. Teng TK, Finn JFRCNA, Hobbs, Michael BS, D.Phil, F.R.A.C.P., Hung JBS., F.R.A.C.P. Heart Failure: Incidence, Case Fatality, and Hospitalization Rates in Western Australia Between 1990 and 2005. Circulation: Heart Failure. 2010;3:236–243. - PubMed
    1. Rosamond W, Flegal K, Friday G, et al. Heart disease and stroke statistics--2007 update: a report from the American Heart Association Statistics Committee and Stroke Statistics Subcommittee. [Erratum appears in Circulation. 2010 Jul 6;122(1):e9 Note: Kissela, Bret [corrected to Kissela, Brett]] Circulation. 2007;115:e69–171. - PubMed
    1. Writing Group Members. Roger VL, Go AS, et al. Heart Disease and Stroke Statistics--2012 Update: A Report From the American Heart Association. Circulation. 2012;125:e2–e220. - PMC - PubMed
    1. Schocken DD, Benjamin EJ, Fonarow GC, et al. Prevention of heart failure: a scientific statement from the American Heart Association Councils on Epidemiology and Prevention, Clinical Cardiology, Cardiovascular Nursing, and High Blood Pressure Research; Quality of Care and Outcomes Research Interdisciplinary Working Group; and Functional Genomics and Translational Biology Interdisciplinary Working Group. Circulation. 2008;117:2544–2565. - PubMed

Publication types