Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2025 Mar 4;5(1):58.
doi: 10.1038/s43856-025-00769-y.

Data linkage multiplies research insights across diverse healthcare sectors

Affiliations
Review

Data linkage multiplies research insights across diverse healthcare sectors

T S Karin Eisinger-Mathason et al. Commun Med (Lond). .

Abstract

In all fields of study, as well as government and commerce, high-quality data enables informed decision-making. Linking data from disparate sources multiplies the opportunities for novel insights and evidence-based decision-making for an increasingly large range of administrative, clinical, research, and population health use cases. In recent years, novel methods, including privacy-preserving record linkage methods, have emerged. However, regardless of the method, successful data linkage is highly dependent on data quality and completeness and has to be balanced by the increased risk of re-identification of the subsequently linked data. Opportunities for the future include sharing tools for responsible linkage across silos, enhancing data to improve quality and completeness, and ensuring linkage leverages inclusive and representative datasets to ensure a balance between individual privacy and representation in research and novel discoveries. Here we provide a brief overview of the history and current state of data linkage, highlight the opportunities created by linked population data across critical research sectors, and describe the technology and policies that govern its usage.

PubMed Disclaimer

Conflict of interest statement

Competing interests: All authors are employed by or are external consultants at Datavant Inc., a commercial entity that produces data linkage technology.

Figures

Fig. 1
Fig. 1. Real World Data (RWD) reaches scale and completeness for linkage.
Early RWD were limited to commercial claims, which were digitized earlier than hospital and medical records. Federal funding post 2010 incentivized hospitals to switch to electronic record keeping. Simultaneously, genomic tools for identification of genetic mutations altered gene expression became available from consented patients. These records were digital from their inception and though most frequently de-identified, have the potential to be linked with other matched records. Recent years have brought an explosion of consumer/patient data from internet applications (i.e., weight loss tools), wearable devices (i.e., apple watches, glucose monitors) and other digital records. Together these linked data comprise a massive digital fingerprint for participating individuals.
Fig. 2
Fig. 2. Multiple types of Real World Data (RWD) can be linked to facilitate diverse research and commercial activities.
Diverse data inputs can be linked together to inform a variety of sectors. Featured here are a series of examples. Mortality data can be linked to insurance claims and medical costs to extract outcomes of clinical trials and rare diseases etc., which can then inform funding and policy decisions. Also, purchasing and behavior data can be linked to facilitate Health Economics and Outcomes Research (HEOR) studies. These are simplified examples of data-driven research, commercial, and policy decision making.
Fig. 3
Fig. 3. Privacy Preserving Record Linkage (PPRL) Data linkage allows multiple records from diverse datasets to be linked.
An individual can have their personal data recorded in many different datasets. For example, in the healthcare system a patient may appear in electronic health records (EHR), insurance claims, and mortality records. To be used in a manner that protects personally identifiable information (PII) and maintain privacy, these datasets undergo a process of normalization, identity resolution, and matching. Once records are matched and assigned a unique identifier/hash these data can be queried in a variety of ways.
Fig. 4
Fig. 4. Examples of record matching and de-identification approaches.
a Individual records, even for the same person, may be distinct enough to introduce linkage challenges. Here, multiple unique datasets containing records for one individual are compared. For successful Privacy Preserving Record Linkage (PPRL) linkage of the two datasets followed by de-identification, identity resolution is performed to confirm that the relevant records capture the same individual. b Identical information, captured in slightly different formats, must be modified to uniformity and then necessary personally identifiable information (PII) removed for general encryption. Next, the normalized and encrypted data can be linked and further modified for specific uses.

Similar articles

References

    1. Weber, G. M., Mandl, K. D. & Kohane, I. S. Finding the missing link for big biomedical data. JAMA311, 2479–2480 (2014). - PubMed
    1. Stange, K. C. The problem of fragmentation and the need for integrative solutions. Ann. Fam. Med.7, 100–103 (2009). - PMC - PubMed
    1. Cebul, R. D., Rebitzer, J. B., Taylor, L. J. & Votruba, M. E. Organizational fragmentation and care quality in the U.S healthcare system. J. Econ. Perspect.22, 93–113 (2008). - PubMed
    1. Song, J. et al. Utilization of electronic health record data to evaluate the association of urban environment on systemic lupus erythematosus symptoms. Rheumatology (Oxford). 10.1093/rheumatology/keac647 (2022). - PMC - PubMed
    1. Walunas, T. L. et al. Disease outcomes and care fragmentation among patients with systemic lupus erythematosus. Arthritis Care Res.69, 1369–1376 (2017). - PMC - PubMed

LinkOut - more resources