Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Case Reports
. 2017 Dec 15;5(3):4.
doi: 10.5334/egems.196.

Data Cleaning in the Evaluation of a Multi-Site Intervention Project

Affiliations
Case Reports

Data Cleaning in the Evaluation of a Multi-Site Intervention Project

Gavin Welch et al. EGEMS (Wash DC). .

Abstract

Context: The High Value Healthcare Collaborative (HVHC) sepsis project was a two-year multi-site project where Member health care delivery systems worked on improving sepsis care using a dissemination & implementation framework designed by HVHC. As part of the project evaluation, participating Members provided 5 data submissions over the project period. Members created data files using a uniform specification, but the data sources and methods used to create the data sets differed. Extensive data cleaning was necessary to get a data set usable for the evaluation analysis.

Case description: HVHC was the coordinating center for the project and received and cleaned all data submissions. Submissions received 3 sequentially more detailed levels of checking by HVHC. The most detailed level evaluated validity by comparing values within-Member over time and between Member. For a subset of episodes Member-submitted data were compared to matched Medicare claims data.

Findings: Inconsistencies in data submissions, particularly for length-of-stay variables were common in early submissions and decreased with subsequent submissions. Multiple resubmissions were sometimes required to get clean data. Data checking also uncovered a systematic difference in the way Medicare and some members defined intensive care unit stay.

Conclusions: Data checking is a critical for ensuring valid analytic results for projects using electronic health record data. It is important to budget sufficient resources for data checking. Interim data submissions and checks help find anomalies early. Data resubmissions should be checked as fixes can introduce new errors. Communicating with those responsible for creating the data set provides critical information.

Keywords: data completeness; data error; data quality; data validity; electronic health record; routinely collected health data.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Comparison of the relationship between total length of stay (LOS) and LOS in the intensive care unit (ICU LOS) showing valid and invalid ICU LOS.
Figure 2
Figure 2
Density plots of intensive care unit length of stay (LOS) for selected Members and project quarters showing anomalies for Member 2.
Figure 3
Figure 3
Proportion (95 percent confidence interval) of reported episodes with complete 3-hour sepsis bundle by quarter showing potential anomalies in reporting for Member 3.

References

    1. Greenland, S and Rothman, K. Fundamentals of Epidemiologic Data Analysis In: Modern Epidemiology. Wolters Kluwer Health/Lippincott Williams Wilkins; 2008; 213.
    1. Adler-Milstein, J and Jha, A. Healthcare’s “Big Data” challenge. Am J Manag Care. 2013. July; 19(7): 537–538. - PubMed
    1. Weiskopf, N and Weng, C. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J Am Med Inform Assoc. 2013. January; 20(1): 144–151. DOI: 10.1136/amiajnl-2011-000681 - DOI - PMC - PubMed
    1. Savitz, L. Exploring the collaborative impact in accelerating the adoption of evidence-based care practices eGEMs. 2017; 5(1) (Wash DC: ).
    1. Wennberg, D, Weiss, L, Kraft, S, Savitz, L and Weinstein, J. Demonstrating Large-Scale Dissemination across HVHC & Beyond; 2014.

Publication types

LinkOut - more resources