Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Dec 5;4(2):2053951717745678.
doi: 10.1177/2053951717745678.

Challenges in administrative data linkage for research

Affiliations

Challenges in administrative data linkage for research

Katie Harron et al. Big Data Soc. .

Abstract

Linkage of population-based administrative data is a valuable tool for combining detailed individual-level information from different sources for research. While not a substitute for classical studies based on primary data collection, analyses of linked administrative data can answer questions that require large sample sizes or detailed data on hard-to-reach populations, and generate evidence with a high level of external validity and applicability for policy making. There are unique challenges in the appropriate research use of linked administrative data, for example with respect to bias from linkage errors where records cannot be linked or are linked together incorrectly. For confidentiality and other reasons, the separation of data linkage processes and analysis of linked data is generally regarded as best practice. However, the 'black box' of data linkage can make it difficult for researchers to judge the reliability of the resulting linked data for their required purposes. This article aims to provide an overview of challenges in linking administrative data for research. We aim to increase understanding of the implications of (i) the data linkage environment and privacy preservation; (ii) the linkage process itself (including data preparation, and deterministic and probabilistic linkage methods) and (iii) linkage quality and potential bias in linked data. We draw on examples from a number of countries to illustrate a range of approaches for data linkage in different contexts.

Keywords: Data linkage; data accuracy administrative data; epidemiological studies; measurement error; record linkage; selection bias.

PubMed Disclaimer

Conflict of interest statement

Declaration of conflicting interests The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

    1. Abbott O, Jones P, Ralphs M. (2015) Large-scale linkage for total populations in official statistics. In: Harron K, Dibben C, Goldstein H. (eds) Methodological Developments in Data Linkage. Chapter 8. London: Wiley.
    1. Aldridge RW, Shaji K, Hayward AC, et al. (2015) Accuracy of probabilistic linkage using the enhanced matching system for public health and epidemiological studies. PLoS ONE 10: e0136179. - PMC - PubMed
    1. Benchimol EI, Smeeth L, Guttmann A, et al. and the Record Working Committee (2015) (2015) The REporting of studies Conducted using Observational Routinely-collected health Data (RECORD) statement. PLoS Medicine 12: e1001885. - PMC - PubMed
    1. Blakely T, Salmond C. (2002) Probabilistic record linkage and a method to calculate the positive predictive value. International Journal of Epidemiology 31: 1246–1252. - PubMed
    1. Bohensky M. (2015) Bias in data linkage studies. In: Harron K, Dibben C, Goldstein H. (eds) Methodological Developments in Data Linkage. Chapter 4. London: Wiley.

LinkOut - more resources