Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Dec 5:2018:340-347.
eCollection 2018.

Learning to Identify Rare Disease Patients from Electronic Health Records

Learning to Identify Rare Disease Patients from Electronic Health Records

Rich Colbaugh et al. AMIA Annu Symp Proc. .

Abstract

There is increasing interest in developing prediction models capable of identifying rare disease patients in population-scale databases such as electronic health records (EHRs). Deriving these models is challenging for many reasons, perhaps the most important being the limited number of patients with 'gold standard' confirmed diagnoses from which to learn. This paper presents a novel cascade learning methodology which induces accurate prediction models from noisy 'silver standard' labeled data - patients provisionally labeled as positive for the target disease based upon unconfirmed evidence. The algorithm combines unsupervised feature selection, supervised ensemble learning, and unsupervised clustering to enable robust learning from noisy labels. The efficacy of the approach is illustrated through a case study involving the detection of lipodystrophy patients in a country-scale database of EHRs. The case study demonstrates our algorithm outperforms state-of-the-art prediction techniques and permits discovery of previously undiagnosed patients in large EHR databases.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Results for LD prediction test. The plot compares accuracy and AUC obtained with four prediction models: kopcke (blue), miotto (cyan), a simple baseline model (yellow), and Algorithm CP (burnt umber); error bars indicate ±2 standard errors.

References

    1. Evans W, Rafi I. ‘Rare diseases in general practice’, British J. General Practice, 2016 Nov;Vol. 66, - PMC - PubMed
    1. Bloss S, et al. ‘Diagnostic needs for rare diseases and shared prediagnostic phenomena’, PLoS ONE. 2017 Feb;Vol. 12, - PMC - PubMed
    1. Zurynski Y, et al. ‘Rare disease: A national survey of pediatricians’ experiences and needs’, BMJ Paediatrics. 2017 Sep;Vol. 1, - PMC - PubMed
    1. Hutchinson C, et al. ‘Challenges in conducting clinical trials in rare diseases’, Regulatory Rapporteur. 2018 Feb;Vol. 15,
    1. Svenstrup D, et al. ‘Rare disease diagnosis: A review of web search, social media, and large-scale data mining approaches’, Rare Diseases. 2015;Vol. 3 - PMC - PubMed