Portability of an algorithm to identify rheumatoid arthritis in electronic health records
- PMID: 22374935
- PMCID: PMC3392871
- DOI: 10.1136/amiajnl-2011-000583
Portability of an algorithm to identify rheumatoid arthritis in electronic health records
Abstract
Objectives: Electronic health records (EHR) can allow for the generation of large cohorts of individuals with given diseases for clinical and genomic research. A rate-limiting step is the development of electronic phenotype selection algorithms to find such cohorts. This study evaluated the portability of a published phenotype algorithm to identify rheumatoid arthritis (RA) patients from EHR records at three institutions with different EHR systems.
Materials and methods: Physicians reviewed charts from three institutions to identify patients with RA. Each institution compiled attributes from various sources in the EHR, including codified data and clinical narratives, which were searched using one of two natural language processing (NLP) systems. The performance of the published model was compared with locally retrained models.
Results: Applying the previously published model from Partners Healthcare to datasets from Northwestern and Vanderbilt Universities, the area under the receiver operating characteristic curve was found to be 92% for Northwestern and 95% for Vanderbilt, compared with 97% at Partners. Retraining the model improved the average sensitivity at a specificity of 97% to 72% from the original 65%. Both the original logistic regression models and locally retrained models were superior to simple billing code count thresholds.
Discussion: These results show that a previously published algorithm for RA is portable to two external hospitals using different EHR systems, different NLP systems, and different target NLP vocabularies. Retraining the algorithm primarily increased the sensitivity at each site.
Conclusion: Electronic phenotype algorithms allow rapid identification of case populations in multiple sites with little retraining.
Conflict of interest statement
Figures
References
-
- Kohane IS. Using electronic health records to drive discovery in disease genomics. Nat Rev Genet 2011;12:417–28 - PubMed
Publication types
MeSH terms
Grants and funding
- R01-AR057108/AR/NIAMS NIH HHS/United States
- 1 UL1 RR024975/RR/NCRR NIH HHS/United States
- R01-GM079330/GM/NIGMS NIH HHS/United States
- K08 AR060257/AR/NIAMS NIH HHS/United States
- R01 AR059648/AR/NIAMS NIH HHS/United States
- T15 LM007450/LM/NLM NIH HHS/United States
- UL1 RR025741/RR/NCRR NIH HHS/United States
- U01 GM092691/GM/NIGMS NIH HHS/United States
- R01-AR059648/AR/NIAMS NIH HHS/United States
- R01 GM079330/GM/NIGMS NIH HHS/United States
- R01 AR057108/AR/NIAMS NIH HHS/United States
- R01-AR056768/AR/NIAMS NIH HHS/United States
- U54-LM008748/LM/NLM NIH HHS/United States
- R01 AR056768/AR/NIAMS NIH HHS/United States
- U01-GM092691/GM/NIGMS NIH HHS/United States
- UL1RR025741/RR/NCRR NIH HHS/United States
- K08-AR060257/AR/NIAMS NIH HHS/United States
- R01 LM010685/LM/NLM NIH HHS/United States
- R01-LM010685/LM/NLM NIH HHS/United States
- U54 LM008748/LM/NLM NIH HHS/United States
- 5T15LM007450-10/LM/NLM NIH HHS/United States
- UL1 RR024975/RR/NCRR NIH HHS/United States
- R01 AR055240/AR/NIAMS NIH HHS/United States
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical
Molecular Biology Databases
