Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jan 1;26(1):61-65.
doi: 10.1093/jamia/ocy154.

Automated and flexible identification of complex disease: building a model for systemic lupus erythematosus using noisy labeling

Affiliations

Automated and flexible identification of complex disease: building a model for systemic lupus erythematosus using noisy labeling

Sara G Murray et al. J Am Med Inform Assoc. .

Abstract

Accurate and efficient identification of complex chronic conditions in the electronic health record (EHR) is an important but challenging task that has historically relied on tedious clinician review and oversimplification of the disease. Here we adapt methods that allow for automated "noisy labeling" of positive and negative controls to create a "silver standard" for machine learning to automate identification of systemic lupus erythematosus (SLE). Our final model, which includes both structured data as well as text processing of clinical notes, outperformed all existing algorithms for SLE (AUC 0.97). In addition, we demonstrate how the probabilistic outputs of this model can be adapted to various clinical needs, selecting high thresholds when specificity is the priority and lower thresholds when a more inclusive patient population is desired. Deploying a similar methodology to other complex diseases has the potential to dramatically simplify the landscape of population identification in the EHR.

Mesh terms: Electronic Health Records, Machine Learning, Lupus Erythematosus, Phenotype, Algorithms.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Overview of methodology for training and test set development.
Figure 2.
Figure 2.
Algorithmic probabilities assigned to test set data. Data are shown in 0.05 sized probability bins and labeled according to how each case was classified by clinician expert review. Patients were classified as “not SLE,” “possible SLE” (diagnostic uncertainty or lack of documentation were present), “probable SLE” (features of the disease were present but not meeting full criteria), or “definite SLE” meeting American College of Rheumatology (ACR) criteria.
Figure 3.
Figure 3.
ROC curves for “strict” and “inclusive” definitions of SLE. A) ROC curve for the strict definition of SLE shows an AUC 0.97. B) ROC curve for the inclusive definition of SLE shows an AUC of 0.94.

References

    1. Moores KG, Sathe NA.. A systematic review of validated methods for identifying systemic lupus erythematosus (SLE) using administrative or claims data. Vaccine 2013; 31: K62–73. - PubMed
    1. Carroll RJ, Thompson WK, Eyler AE, et al. Portability of an algorithm to identify rheumatoid arthritis in electronic health records. J Am Med Inform Assoc 2012; 19 (e1): e162–9. - PMC - PubMed
    1. Liao KP, Cai T, Gainer V, et al. Electronic medical records for discovery research in rheumatoid arthritis. Arthritis Care Res 2010; 628: 1120–7. - PMC - PubMed
    1. Barnado A, Casey C, Carroll RJ, et al. Developing electronic health record algorithms that accurately identify patients with systemic lupus erythematosus. Arthritis Care Res 2017; 695: 687–93. - PMC - PubMed
    1. Agarwal V, Podchiyska T, Banda JM, et al. Learning statistical models of phenotypes using noisy labeled training data. J Am Med Inform Assoc 2016; 236: 1166–73. - PMC - PubMed

Publication types