Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Apr 2;21(1):63.
doi: 10.1186/s12874-021-01252-7.

Facilitating harmonized data quality assessments. A data quality framework for observational health research data collections with software implementations in R

Affiliations

Facilitating harmonized data quality assessments. A data quality framework for observational health research data collections with software implementations in R

Carsten Oliver Schmidt et al. BMC Med Res Methodol. .

Abstract

Background: No standards exist for the handling and reporting of data quality in health research. This work introduces a data quality framework for observational health research data collections with supporting software implementations to facilitate harmonized data quality assessments.

Methods: Developments were guided by the evaluation of an existing data quality framework and literature reviews. Functions for the computation of data quality indicators were written in R. The concept and implementations are illustrated based on data from the population-based Study of Health in Pomerania (SHIP).

Results: The data quality framework comprises 34 data quality indicators. These target four aspects of data quality: compliance with pre-specified structural and technical requirements (integrity); presence of data values (completeness); inadmissible or uncertain data values and contradictions (consistency); unexpected distributions and associations (accuracy). R functions calculate data quality metrics based on the provided study data and metadata and R Markdown reports are generated. Guidance on the concept and tools is available through a dedicated website.

Conclusions: The presented data quality framework is the first of its kind for observational health research data collections that links a formal concept to implementations in R. The framework and tools facilitate harmonized data quality assessments in pursue of transparent and reproducible research. Application scenarios comprise data quality monitoring while a study is carried out as well as performing an initial data analysis before starting substantive scientific analyses but the developments are also of relevance beyond research.

Keywords: Data quality; Data quality indicators; Data quality monitoring; Initial data analysis; Observational health studies; R.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Data Quality Concept Overview
Fig. 2
Fig. 2
Key terms related to data structures
Fig. 3
Fig. 3
Example results using R dataquieR applied to SHIP data. 1: A heatmap-like plot to illustrate the applicability of data quality implementations based on an assessment of metadata and study data properties. 2: Histogram with illustrated range violations. 3: Illustration of missing values across different reasons for missing data. 4: Margins-plot to illustrate observer effects

References

    1. Houston ML, Yu AP, Martin DA, Probst DY. Defining and developing a generic framework for monitoring data quality in clinical research. AMIA Annu Symp Proc. 2018;2018:1300–1309. - PMC - PubMed
    1. Fortier I, Burton PR, Robson PJ, Ferretti V, Little J, L'Heureux F, Deschenes M, Knoppers BM, Doiron D, Keers JC, Linksted P, Harris JR, Lachance G, Boileau C, Pedersen NL, Hamilton CM, Hveem K, Borugian MJ, Gallagher RP, McLaughlin J, Parker L, Potter JD, Gallacher J, Kaaks R, Liu B, Sprosen T, Vilain A, Atkinson SA, Rengifo A, Morton R, Metspalu A, Wichmann HE, Tremblay M, Chisholm RL, Garcia-Montero A, Hillege H, Litton JE, Palmer LJ, Perola M, Wolffenbuttel BH, Peltonen L, Hudson TJ. Quality, quantity and harmony: the DataSHaPER approach to integrating data across bioclinical studies. Int J Epidemiol. 2010;39(5):1383–1393. doi: 10.1093/ije/dyq139. - DOI - PMC - PubMed
    1. Huebner M, Le Cessie S, Schmidt CO, Vach W. A contemporary conceptual framework for initial data analysis. Observ Stud. 2018;4:71–192.
    1. Maelstrom guidelines. https://www.maelstrom-research.org/page/maelstrom-guidelines. Accessed 25 Mar 2021.
    1. Arts DG, De Keizer NF, Scheffer GJ. Defining and improving data quality in medical registries: a literature review, case study, and generic framework. J Am Med Inform Assoc. 2002;9(6):600–611. doi: 10.1197/jamia.M1087. - DOI - PMC - PubMed

Publication types

LinkOut - more resources