Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Dec 5:2018:740-749.
eCollection 2018.

Scalable Electronic Phenotyping For Studying Patient Comorbidities

Affiliations

Scalable Electronic Phenotyping For Studying Patient Comorbidities

Albee Y Ling et al. AMIA Annu Symp Proc. .

Abstract

Over 75 million Americans have multiple concurrent chronic conditions and medical decision making for these patients is mostly based on retrospective cohort studies. Current methods to generate cohorts of patients with comorbidities are neither scalable nor generalizable. We propose a supervised machine learning algorithm for learning comorbidity phenotypes without requiring manually created training sets. First, we generated myocardial infarction (MI) and type-2 diabetes (T2DM) patient cohorts using ICD9-based imperfectly labeled samples upon which LASSO logistic regression models were trained. Second, we assessed the effects of training sample size, inclusion of physician input, and inclusion of clinical text features on model performance. Using ICD9 codes as our labeling heuristic, we achieved comparable performance to models created using keywords as labeling heuristic. We found that expert input and higher training sample sizes could compensate for the lack of clinical text derived features. However, our best performing model included clinical text as features with a large training sample size.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Full Model Framework. ICD9 codes were used to generate noisy labels, which were used as training examples in a LASSO logistic regression. The model labeled each patient as a case or control, which was validated against a gold standard clinician-reviewed dataset.
Figure 2.
Figure 2.
Schematic showing the range of trade-offs that were explored to maximize model performance while minimizing model complexity.
Figure 3.
Figure 3.
Model accuracy as a function of increasing training sample size. Red corresponds to performance of the model with clinical text as features, and blue corresponds to performance of the model without clinical text as features.
Figure 4.
Figure 4.
Model iteration using feature anchoring and feature removal
Figure 5.
Figure 5.
Models without clinical text were evaluated to assess whether increasing the training sample size would allow for comparable performance in models with and without clinical text as features. When anchoring was added to the model with large training sample size (N=5,000) without clinical text, it could perform as well as a model using clinical text as features.

References

    1. Fortin M, Stewart M, Poitras ME, Almirall J, Maddocks H. A systematic review of prevalence studies on multimorbidity: toward a more uniform methodology. The Annals of Family Medicine. 2012 Mar 1;10(2):142–51. - PMC - PubMed
    1. Wallace E, Smith SM, Perera-Salazar R, Vaucher P, McCowan C, Collins G, Verbakel J, Lakhanpaul M, Fahey T. Framework for the impact analysis and implementation of Clinical Prediction Rules (CPRs). BMC medical informatics and decision making. 2011 Oct 14;11(1):62. - PMC - PubMed
    1. Parekh AK, Barton MB. The challenge of multiple comorbidity for the US health care system. Jama. 2010 Apr 7;303(13):1303–4. - PubMed
    1. Anderson GF. 2010. Chronic care: making the case for ongoing care. Robert Wood Johnson Foundation.
    1. Boyd CM, Darer J, Boult C, Fried LP, Boult L, Wu AW. Clinical practice guidelines and quality of care for older patients with multiple comorbid diseases: implications for pay for performance. Jama. 2005 Aug 10;294(6):716–24. - PubMed

LinkOut - more resources