Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jun;1(1):1-18.
doi: 10.1007/s41666-017-0005-6. Epub 2017 Jun 8.

An Interoperable Similarity-based Cohort Identification Method Using the OMOP Common Data Model version 5.0

Affiliations

An Interoperable Similarity-based Cohort Identification Method Using the OMOP Common Data Model version 5.0

Shreya Chakrabarti et al. J Healthc Inform Res. 2017 Jun.

Abstract

Cohort identification for clinical studies tends to be laborious, time-consuming, and expensive. Developing automated or semi-automated methods for cohort identification is one of the "holy grails" in the field of biomedical informatics. We propose a high-throughput similarity-based cohort identification algorithm by applying numerical abstractions on Electronic Health Records (EHR) data. We implement this algorithm using the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM), which enables sites using this standardized EHR data representation to avail this algorithm with minimum effort for local implementation. We validate its performance for a retrospective cohort identification task on six clinical trials conducted at the Columbia University Medical Center. Our algorithm achieves an average Area Under the Curve (AUC) of 0.966 and an average Precision at 5 of 0.983. This interoperable method promises to achieve efficient cohort identification in EHR databases. We discuss suitable applications of our method and its limitations and propose warranted future work.

Keywords: Case-based Reasoning (CBR); Cohort Identification; Electronic Health Records (EHR); Observational Medical Outcomes Partnership (OMOP); Phenotype; Similarity-based.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest None.

Figures

Fig. 1
Fig. 1
Populations and sub-populations associated with a cohort identification task: a similarity-based cohort identification algorithm uses the seed patient population D to identify new cases in sub-populations R2 and R3
Fig. 2
Fig. 2
Method for building the target patient for a cohort identification task using the summarized EHR traits for n previously identified cases and using it to rank patients in the CDW based on similarity to the target patient; this figure shows the summarized and normalized feature vectors for different patients as well as for the target patient
Fig. 3
Fig. 3
Sensitivity versus specificity plots for the six trials, plotted for various thresholds on the cosine distance from the target patient TP (t in Equation (2) varied from 0 to 100, in steps of 1)
Fig. 4
Fig. 4
An example of the similarity based ranked list of test patients obtained using the target patient representation derived for Trial 1; the term 'dist' here refers to the cosine distance. The patients marked with a green tick and a red cross would be deemed similar and dissimilar to the target patient respectively

References

    1. Hersh WR. Adding value to the electronic health record through secondary use of data for quality assurance, research, and surveillance. Am J Manag Care. 2007;13:277–278. - PubMed
    1. Safran C, Bloomrosen M, Hammond WE, Labkoff S, Markel-Fox S, Tang PC, Detmer DE, Expert Panel W (2007) Input from the expert panel (see A.A.: Toward a national framework for the secondary use of health data: an American Medical Informatics Association White Paper. J Am Med Inform Assoc 14: 1–9. doi:10.1197/jamia.M2273 - PMC - PubMed
    1. Shivade C, Raghavan P, Fosler-Lussier E, Embi PJ, Elhadad N, Johnson SB, Lai AM. A review of approaches to identifying patient phenotype cohorts using electronic health records. J Am Med Inform Assoc. 2014;21:221–230. doi: 10.1136/amiajnl-2013-001935. - DOI - PMC - PubMed
    1. Conway M, Berg RL, Carrell D, Denny JC, Kho AN, Kullo IJ, Linneman JG, Pacheco JA, Peissig P, Rasmussen L, Weston N, Chute CG, Pathak J (2011) Analyzing the heterogeneity and complexity of Electronic Health Record oriented phenotyping algorithms. AMIA ... Annu. Symp. proceedings. AMIA Symp 274–83 - PMC - PubMed
    1. Collins JF, Williford WO, Weiss DG, Bingham SF, Klett CJ. Planning patient recruitment: fantasy and reality. Stat Med. 1984;3:435–443. doi: 10.1002/sim.4780030425. - DOI - PubMed

LinkOut - more resources