Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Meta-Analysis
. 2021 Nov 25;28(12):2582-2592.
doi: 10.1093/jamia/ocab187.

ATLAS: an automated association test using probabilistically linked health records with application to genetic studies

Affiliations
Meta-Analysis

ATLAS: an automated association test using probabilistically linked health records with application to genetic studies

Harrison G Zhang et al. J Am Med Inform Assoc. .

Abstract

Objective: Large amounts of health data are becoming available for biomedical research. Synthesizing information across databases may capture more comprehensive pictures of patient health and enable novel research studies. When no gold standard mappings between patient records are available, researchers may probabilistically link records from separate databases and analyze the linked data. However, previous linked data inference methods are constrained to certain linkage settings and exhibit low power. Here, we present ATLAS, an automated, flexible, and robust association testing algorithm for probabilistically linked data.

Materials and methods: Missing variables are imputed at various thresholds using a weighted average method that propagates uncertainty from probabilistic linkage. Next, estimated effect sizes are obtained using a generalized linear model. ATLAS then conducts the threshold combination test by optimally combining P values obtained from data imputed at varying thresholds using Fisher's method and perturbation resampling.

Results: In simulations, ATLAS controls for type I error and exhibits high power compared to previous methods. In a real-world genetic association study, meta-analysis of ATLAS-enabled analyses on a linked cohort with analyses using an existing cohort yielded additional significant associations between rheumatoid arthritis genetic risk score and laboratory biomarkers.

Discussion: Weighted average imputation weathers false matches and increases contribution of true matches to mitigate linkage error-induced bias. The threshold combination test avoids arbitrarily choosing a threshold to rule a match, thus automating linked data-enabled analyses and preserving power.

Conclusion: ATLAS promises to enable novel and powerful research studies using linked data to capitalize on all available data sources.

Keywords: biorepositories; electronic health records; genetic association studies; perturbation resampling; record linkage.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Schematic of the proposed ATLAS algorithm.
Figure 2.
Figure 2.
Schematic of real-world rheumatoid arthritis genetic association studies conducted using the MGB Biobank and the Crimson-linked data.
Figure 3.
Figure 3.
Comparison of type I error rates of ATLAS and Han et al estimators in simulation settings with different noise levels and average codes per patient record (simulations under H0). ATLAS type I error rates reported for several single cutoff thresholds and the ATLAS threshold combination test.
Figure 4.
Figure 4.
Comparison of empirical power of ATLAS and Han et al estimators in simulation settings with different effect sizes and average codes per patient record (simulations under H1). Results were generated using the best match imputation method. ATLAS power reported for several single cutoff thresholds and the ATLAS threshold combination test.
Figure 5.
Figure 5.
Comparison of empirical power of ATLAS and Han et al estimators in the presence of false matches between databases (simulations under H1). Simulated databases report on average 16 codes per patient record.
Figure 6.
Figure 6.
Logarithm transformed P values from genetic association study using only RA patients with previously available genotype data at MGB Biobank and after incorporating additional RA patients with genotype data through the Crimson-linked cohort.

References

    1. Kohane IS, Churchill SE, Murphy SN.. A translational engine at the national scale: informatics for integrating biology and the bedside. J Am Med Inform Assoc 2012; 19 (2): 181–5. - PMC - PubMed
    1. Butte AJ. Translational bioinformatics: coming of age. J Am Med Inform Assoc 2008; 15 (6): 709–14. - PMC - PubMed
    1. Jiao Y, Lesueur F, Azencott CA, et al. A new hybrid record linkage process to make epidemiological databases interoperable: application to the GEMO and GENEPSO studies involving BRCA1 and BRCA2 mutation carriers. BMC Med Res Methodol 2021; 21 (1): 155. - PMC - PubMed
    1. Gutman R, Afendulis CC, Zaslavsky AM.. A Bayesian procedure for file linking to analyze end-of-life medical costs. J Am Stat Assoc 2013; 108 (501): 34–47. - PMC - PubMed
    1. Neter J, Maynes ES, Ramanathan R.. The effect of mismatching on the measurement of response errors. J Am Stat Assoc 1965; 60: 1005–27.

Publication types