Scalable Electronic Phenotyping For Studying Patient Comorbidities

Albee Y Ling¹, Emily Alsentzer¹, Josephine Chen¹, Juan M Banda², Suzanne Tamang³, Evan Minty^{1

2}

Affiliations

¹ Biomedical Informatics Training Program, Stanford University, Stanford, CA.
² Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA.
³ Department of Biomedical Data Science, Stanford University, Stanford, CA.

PMID: 30815116
PMCID: PMC6371288

Scalable Electronic Phenotyping For Studying Patient Comorbidities

Albee Y Ling et al. AMIA Annu Symp Proc. 2018.

. 2018 Dec 5:2018:740-749.

eCollection 2018.

Authors

Albee Y Ling¹, Emily Alsentzer¹, Josephine Chen¹, Juan M Banda², Suzanne Tamang³, Evan Minty^{1

2}

Affiliations

¹ Biomedical Informatics Training Program, Stanford University, Stanford, CA.
² Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA.
³ Department of Biomedical Data Science, Stanford University, Stanford, CA.

PMID: 30815116
PMCID: PMC6371288

Abstract

Over 75 million Americans have multiple concurrent chronic conditions and medical decision making for these patients is mostly based on retrospective cohort studies. Current methods to generate cohorts of patients with comorbidities are neither scalable nor generalizable. We propose a supervised machine learning algorithm for learning comorbidity phenotypes without requiring manually created training sets. First, we generated myocardial infarction (MI) and type-2 diabetes (T2DM) patient cohorts using ICD9-based imperfectly labeled samples upon which LASSO logistic regression models were trained. Second, we assessed the effects of training sample size, inclusion of physician input, and inclusion of clinical text features on model performance. Using ICD9 codes as our labeling heuristic, we achieved comparable performance to models created using keywords as labeling heuristic. We found that expert input and higher training sample sizes could compensate for the lack of clinical text derived features. However, our best performing model included clinical text as features with a large training sample size.

PubMed Disclaimer

Figures

**Figure 1.**
Full Model Framework. ICD9 codes were used to generate noisy labels, which were used as training examples in a LASSO logistic regression. The model labeled each patient as a case or control, which was validated against a gold standard clinician-reviewed dataset.

**Figure 2.**
Schematic showing the range of trade-offs that were explored to maximize model performance while minimizing model complexity.

**Figure 3.**
Model accuracy as a function of increasing training sample size. Red corresponds to performance of the model with clinical text as features, and blue corresponds to performance of the model without clinical text as features.

**Figure 4.**
Model iteration using feature anchoring and feature removal

**Figure 5.**
Models without clinical text were evaluated to assess whether increasing the training sample size would allow for comparable performance in models with and without clinical text as features. When anchoring was added to the model with large training sample size (N=5,000) without clinical text, it could perform as well as a model using clinical text as features.

See this image and copyright information in PMC

References

1. Fortin M, Stewart M, Poitras ME, Almirall J, Maddocks H. A systematic review of prevalence studies on multimorbidity: toward a more uniform methodology. The Annals of Family Medicine. 2012 Mar 1;10(2):142–51. - PMC - PubMed
1. Wallace E, Smith SM, Perera-Salazar R, Vaucher P, McCowan C, Collins G, Verbakel J, Lakhanpaul M, Fahey T. Framework for the impact analysis and implementation of Clinical Prediction Rules (CPRs). BMC medical informatics and decision making. 2011 Oct 14;11(1):62. - PMC - PubMed
1. Parekh AK, Barton MB. The challenge of multiple comorbidity for the US health care system. Jama. 2010 Apr 7;303(13):1303–4. - PubMed
1. Anderson GF. 2010. Chronic care: making the case for ongoing care. Robert Wood Johnson Foundation.
1. Boyd CM, Darer J, Boult C, Fried LP, Boult L, Wu AW. Clinical practice guidelines and quality of care for older patients with multiple comorbid diseases: implications for pay for performance. Jama. 2005 Aug 10;294(6):716–24. - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Scalable Electronic Phenotyping For Studying Patient Comorbidities

Affiliations

Scalable Electronic Phenotyping For Studying Patient Comorbidities

Authors

Affiliations

Abstract

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources

Medical