Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Jun;19(e1):e162-9.
doi: 10.1136/amiajnl-2011-000583. Epub 2012 Feb 28.

Portability of an algorithm to identify rheumatoid arthritis in electronic health records

Affiliations

Portability of an algorithm to identify rheumatoid arthritis in electronic health records

Robert J Carroll et al. J Am Med Inform Assoc. 2012 Jun.

Abstract

Objectives: Electronic health records (EHR) can allow for the generation of large cohorts of individuals with given diseases for clinical and genomic research. A rate-limiting step is the development of electronic phenotype selection algorithms to find such cohorts. This study evaluated the portability of a published phenotype algorithm to identify rheumatoid arthritis (RA) patients from EHR records at three institutions with different EHR systems.

Materials and methods: Physicians reviewed charts from three institutions to identify patients with RA. Each institution compiled attributes from various sources in the EHR, including codified data and clinical narratives, which were searched using one of two natural language processing (NLP) systems. The performance of the published model was compared with locally retrained models.

Results: Applying the previously published model from Partners Healthcare to datasets from Northwestern and Vanderbilt Universities, the area under the receiver operating characteristic curve was found to be 92% for Northwestern and 95% for Vanderbilt, compared with 97% at Partners. Retraining the model improved the average sensitivity at a specificity of 97% to 72% from the original 65%. Both the original logistic regression models and locally retrained models were superior to simple billing code count thresholds.

Discussion: These results show that a previously published algorithm for RA is portable to two external hospitals using different EHR systems, different NLP systems, and different target NLP vocabularies. Retraining the algorithm primarily increased the sensitivity at each site.

Conclusion: Electronic phenotype algorithms allow rapid identification of case populations in multiple sites with little retraining.

PubMed Disclaimer

Conflict of interest statement

Competing interests: None.

Figures

Figure 1
Figure 1
Algorithm overview. EMR, electronic medical record; ICD-9, International Classification of Diseases, version 9 CM; NLP, natural language processing.
Figure 2
Figure 2
Evaluation flowchart. EHR, electronic health record; ICD-9, International Classification of Diseases, version 9 CM.
Figure 3
Figure 3
Receiver operating characteristic curves for each test set. The vertical line represents the 97% specificity cut-off used in this study. The test performance at Partners, Northwestern, and Vanderbilt are found in (a), (b), and (c), respectively.

References

    1. Ritchie MD, Denny JC, Crawford DC, et al. Robust replication of genotype-phenotype associations across multiple diseases in an electronic medical record. Am J Hum Genet 2010;86:560–72 - PMC - PubMed
    1. Kurreeman F, Liao K, Chibnik L, et al. Genetic basis of autoantibody positive and negative rheumatoid arthritis risk in a multi-ethnic cohort derived from electronic health records. Am J Hum Genet 2011;88:57–69 - PMC - PubMed
    1. Kullo IJ, Ding K, Jouni H, et al. A genome-wide association study of red blood cell traits using the electronic medical record. PLoS ONE 2010;5:pii: e13011 - PMC - PubMed
    1. Denny JC, Ritchie MD, Crawford DC, et al. Identification of genomic predictors of atrioventricular conduction: using electronic medical records as a tool for genome science. Circulation 2010;122:2016–21 - PMC - PubMed
    1. Kohane IS. Using electronic health records to drive discovery in disease genomics. Nat Rev Genet 2011;12:417–28 - PubMed

Publication types