Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Oct 14;6(1):e10293.
doi: 10.1002/lrh2.10293. eCollection 2022 Jan.

Developing real-world evidence from real-world data: Transforming raw data into analytical datasets

Affiliations

Developing real-world evidence from real-world data: Transforming raw data into analytical datasets

Lisa Bastarache et al. Learn Health Syst. .

Abstract

Development of evidence-based practice requires practice-based evidence, which can be acquired through analysis of real-world data from electronic health records (EHRs). The EHR contains volumes of information about patients-physical measurements, diagnoses, exposures, and markers of health behavior-that can be used to create algorithms for risk stratification or to gain insight into associations between exposures, interventions, and outcomes. But to transform real-world data into reliable real-world evidence, one must not only choose the correct analytical methods but also have an understanding of the quality, detail, provenance, and organization of the underlying source data and address the differences in these characteristics across sites when conducting analyses that span institutions. This manuscript explores the idiosyncrasies inherent in the capture, formatting, and standardization of EHR data and discusses the clinical domain and informatics competencies required to transform the raw clinical, real-world data into high-quality, fit-for-purpose analytical data sets used to generate real-world evidence.

Keywords: data science; real‐world data; real‐world evidence.

PubMed Disclaimer

Conflict of interest statement

None of the authors have any conflicts of interest to report nor received any funding in support of the manuscript development.

Figures

FIGURE 1
FIGURE 1
Semantic shift in International Classification of Diseases (ICD) code causes shift in prevalence. The National Center for Health Statistics (NCHS) and the Centers for Medicare and Medicaid Services (CMS) periodically adds new code and updates guidance on the use of existing ICD‐CM codes, which can radically shift coding practices at an institution. Below is an example in the change of prevalence of codes relating to shock in a 200‐bed hospital
FIGURE 2
FIGURE 2
Venn diagrams of overlap of suggestive diagnoses, medications, and laboratory results for type 2 diabetes at two different institutions. The different ratios of overlap of data elements for the diabetes computable phenotype suggest algorithm behavior is different between the two sites. These site‐level differences in the proportions of patients with different markers of disease may be accompanied by differences in other characteristics that may impact the performance of predictive algorithms developed at one site and applied at another
FIGURE 3
FIGURE 3
Differences in mean values of common laboratory results measured in the inpatient vs outpatient setting
FIGURE 4
FIGURE 4
Trends in laboratory test ordering over time. Panel A shows the percentage of individuals with a vitamin D measurement among all patients with at least one laboratory measurement in that year. Panel B shows the same for low‐density lipoprotein (LDL) cholesterol
FIGURE 5
FIGURE 5
A selection of LOINC terms used to specify laboratory results of pH
FIGURE 6
FIGURE 6
Flow chart for identifying type 2 diabetes in PheKB. Developing a computable phenotype for diabetes is illustrative of many of the issues highlighted in this article. Diabetes can be either type 1 or type 2. While type 1 diabetics always require insulin, type 2 diabetics sometimes require insulin. Some patients, especially those who receive insulin, may have accumulated evidence for the diagnosis of both type 1 and type 2 diabetes over time, so identifying the type of diabetes a patient has from diagnosis codes may be challenging. PheKB provides an algorithm for type 2 diabetes that excludes patients who have ever had a diagnosis of type 1 diabetes. That decision likely increases the positive predictive value of the phenotype but lowers the sensitivity. The flow diagram implies that the diagnosis of type 2 diabetes requires the combination of a type 2 diagnosis plus an abnormal laboratory test result or a type 2 diagnosis plus a suggestive medication or two diagnoses of type 2 diabetes. While this is a single definition, it allows for multiple paths for a diagnosis that could be differentially present at different sites. This definition is one of many that could be developed based on the specific data source and use case

References

    1. Embi PJ, Richesson R, Tenenbaum J, et al. Reimagining the research‐practice relationship: policy recommendations for informatics‐enabled evidence‐generation across the US health system. JAMIA Open. 2019;2(1):2‐9. 10.1093/jamiaopen/ooy056 - DOI - PMC - PubMed
    1. Friedman CP, Wong AK, Blumenthal D. Achieving a nationwide learning health system. Sci Transl Med. 2010;2(57):57cm29. 10.1126/scitranslmed.3001456 - DOI - PubMed
    1. Wagner J, Hall JD, Ross RL, et al. Implementing risk stratification in primary care: challenges and strategies. J Am Board Fam Med. 2019;32(4):585‐595. 10.3122/jabfm.2019.04.180341 - DOI - PubMed
    1. Kahn MG, Callahan TJ, Barnard J, et al. A harmonized data quality assessment terminology and framework for the secondary use of electronic health record data. EGEMS (Wash DC). 2016;4(1):1244. 10.13063/2327-9214.1244 - DOI - PMC - PubMed
    1. Hersh WR, Weiner MG, Embi PJ, et al. Caveats for the use of operational electronic health record data in comparative effectiveness research. Med Care. 2013;51(8 Suppl 3):S30‐S37. 10.1097/MLR.0b013e31829b1dbd - DOI - PMC - PubMed

LinkOut - more resources