Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Meta-Analysis
. 2023 Sep 5;14(1):5419.
doi: 10.1038/s41467-023-41185-x.

Demonstrating paths for unlocking the value of cloud genomics through cross cohort analysis

Affiliations
Meta-Analysis

Demonstrating paths for unlocking the value of cloud genomics through cross cohort analysis

Nicole Deflaux et al. Nat Commun. .

Abstract

Recently, large scale genomic projects such as All of Us and the UK Biobank have introduced a new research paradigm where data are stored centrally in cloud-based Trusted Research Environments (TREs). To characterize the advantages and drawbacks of different TRE attributes in facilitating cross-cohort analysis, we conduct a Genome-Wide Association Study of standard lipid measures using two approaches: meta-analysis and pooled analysis. Comparison of full summary data from both approaches with an external study shows strong correlation of known loci with lipid levels (R2 ~ 83-97%). Importantly, 90 variants meet the significance threshold only in the meta-analysis and 64 variants are significant only in pooled analysis, with approximately 20% of variants in each of those groups being most prevalent in non-European, non-Asian ancestry individuals. These findings have important implications, as technical and policy choices lead to cross-cohort analyses generating similar, but not identical results, particularly for non-European ancestral populations.

PubMed Disclaimer

Conflict of interest statement

P.N. reports investigator-initiated grants from Amgen, Apple, AstraZeneca, Boston Scientific, and Novartis, personal fees from Apple, AstraZeneca, Blackstone Life Sciences, Foresite Labs, Novartis, Roche/Genentech, is a co-founder of TenSixteen Bio, is a shareholder of geneXwell and TenSixteen Bio, and spousal employment at Vertex, all unrelated to the present work. A.G.B. is a co-founder and shareholder of TenSixteen Bio unrelated to the present work. N.D. and D.G. are employees of Verily Life Sciences and may own stock as part of the standard compensation package. A.P. serves as a Google Ventures (GV) venture partner and holds an equity interest in certain of GV’s affiliated investment funds. A.P. has also received funding from Verily, MSFT, Intel, IBM, Bayer, Pfizer, Astra Zeneca, and Biogen. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Outline of steps in the meta- and pooled analyses for All of Us and UK Biobank cross-cohort analysis.
Researchers analyzing data across TREs, using either meta-analysis or a pooled approach, must negotiate policy requirements and technical hurdles. Bold outline is used for computational steps where data merging occurs. Top: Computational steps involved in meta-analysis, many of which are duplicated. Bottom: Computational steps involved in pooled analysis, where each distinct step is performed only once. All of Us, the All of Us logo, and “The Future of Health Begins with You” are service marks of the U.S. Department of Health and Human Services.
Fig. 2
Fig. 2. Flow diagram highlighting the number of variants and sequenced samples retained at each stage of the meta- and pooled analyses.
Whole Genome Sequencing, WGS. Whole Exome Sequencing, WES. Minor Allele Count, MAC.
Fig. 3
Fig. 3. GWAS phenotype and results.
a Participant LDL-C levels for each cohort, before (left) and after (right) adjusting for statin use. The black center line denotes the median value (50th percentile), while the boxes contain the 25th to 75th percentiles of data. The black whiskers mark the 5th and 95th percentiles, and values beyond these upper and lower bounds are considered outliers, marked with black dots. Note that a few very high outliers were filtered to improve readability of the plot. b Meta analysis results for LDL-C GWAS on merged exonic variants. c Pooled results for LDL-C GWAS on merged exonic variants. Both replicate known gene associations.
Fig. 4
Fig. 4. Scientific differences in pooled and meta-analyses.
a Examination of variants included only in the pooled analysis. b Comparison of lipid GWAS results against two previously published reference datasets: Hindy and Selvaraj. HDL high-density lipoprotein cholesterol, LDL low-density lipoprotein cholesterol, TC total cholesterol, TG triglycerides (c) Bar chart of ancestry proportions across all methods with the variant results meeting genome-wide significance superimposed. Here, AFR, AMR, EAS, NFE, and SAS indicate African, American, East Asian, Non-Finish European, and South Asian ancestry groups, respectively.

References

    1. Bycroft C, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562:203–209. doi: 10.1038/s41586-018-0579-z. - DOI - PMC - PubMed
    1. All of Us Research Program Investigators. et al. The “All of Us” research program. N. Engl. J. Med. 2019;381:668–676. doi: 10.1056/NEJMsr1809937. - DOI - PMC - PubMed
    1. UK Health Data Research Alliance & NHSX. Building Trusted Research Environments - principles and best practices; Towards TRE ecosystems. Preprint at 10.5281/ZENODO.5767586 (2021).
    1. Hubbard, T., Reilly, G., Varma, S. & Seymour, D. Trusted research environments (TRE) green paper. Preprint at 10.5281/ZENODO.4594704 (2020).
    1. Schatz MC, Langmead B, Salzberg SL. Cloud computing and the DNA data race. Nat. Biotechnol. 2010;28:691–693. doi: 10.1038/nbt0710-691. - DOI - PMC - PubMed

Publication types