Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Meta-Analysis
. 2023 Jan;55(1):154-164.
doi: 10.1038/s41588-022-01225-6. Epub 2022 Dec 23.

Powerful, scalable and resource-efficient meta-analysis of rare variant associations in large whole genome sequencing studies

Xihao Li  1 Corbin Quick  1 Hufeng Zhou  1 Sheila M Gaynor  1 Yaowu Liu  2 Han Chen  3   4 Margaret Sunitha Selvaraj  5   6   7 Ryan Sun  8 Rounak Dey  1 Donna K Arnett  9 Lawrence F Bielak  10 Joshua C Bis  11 John Blangero  12 Eric Boerwinkle  3   13 Donald W Bowden  14 Jennifer A Brody  11 Brian E Cade  6   15   16 Adolfo Correa  17 L Adrienne Cupples  18   19 Joanne E Curran  12 Paul S de Vries  3 Ravindranath Duggirala  12 Barry I Freedman  20 Harald H H Göring  12 Xiuqing Guo  21 Jeffrey Haessler  22 Rita R Kalyani  23 Charles Kooperberg  22 Brian G Kral  23 Leslie A Lange  24 Ani Manichaikul  25 Lisa W Martin  26 Stephen T McGarvey  27 Braxton D Mitchell  28   29 May E Montasser  30 Alanna C Morrison  3 Take Naseri  31 Jeffrey R O'Connell  28 Nicholette D Palmer  14 Patricia A Peyser  10 Bruce M Psaty  11   32   33 Laura M Raffield  34 Susan Redline  15   16   35 Alexander P Reiner  22   32 Muagututi'a Sefuiva Reupena  36 Kenneth M Rice  37 Stephen S Rich  25 Colleen M Sitlani  11 Jennifer A Smith  10   38 Kent D Taylor  21 Ramachandran S Vasan  19   39 Cristen J Willer  40   41   42 James G Wilson  43 Lisa R Yanek  23 Wei Zhao  10   38 NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium, TOPMed Lipids Working GroupJerome I Rotter  21 Pradeep Natarajan  5   6   7 Gina M Peloso  18 Zilin Li  44   45 Xihong Lin  46   47   48
Collaborators, Affiliations
Meta-Analysis

Powerful, scalable and resource-efficient meta-analysis of rare variant associations in large whole genome sequencing studies

Xihao Li et al. Nat Genet. 2023 Jan.

Abstract

Meta-analysis of whole genome sequencing/whole exome sequencing (WGS/WES) studies provides an attractive solution to the problem of collecting large sample sizes for discovering rare variants associated with complex phenotypes. Existing rare variant meta-analysis approaches are not scalable to biobank-scale WGS data. Here we present MetaSTAAR, a powerful and resource-efficient rare variant meta-analysis framework for large-scale WGS/WES studies. MetaSTAAR accounts for relatedness and population structure, can analyze both quantitative and dichotomous traits and boosts the power of rare variant tests by incorporating multiple variant functional annotations. Through meta-analysis of four lipid traits in 30,138 ancestrally diverse samples from 14 studies of the Trans Omics for Precision Medicine (TOPMed) Program, we show that MetaSTAAR performs rare variant meta-analysis at scale and produces results comparable to using pooled data. Additionally, we identified several conditionally significant rare variant associations with lipid traits. We further demonstrate that MetaSTAAR is scalable to biobank-scale cohorts through meta-analysis of TOPMed WGS data and UK Biobank WES data of ~200,000 samples.

PubMed Disclaimer

Conflict of interest statement

S.M.G. is now an employee of Regeneron Genetics Center. For B.D.M.: The Amish Research Program receives partial support from Regeneron Pharmaceuticals. M.E.M. reports grant from Regeneron Pharmaceutical unrelated to the present work. B.M.P. serves on the Steering Committee of the Yale Open Data Access Project funded by Johnson & Johnson. L.M.R. is a consultant for the TOPMed Admistrative Coordinating Center (through Westat). For S.R.: Jazz Pharma, Eli Lilly, Apnimed, unrelated to the present work. The spouse of C.J.W. works at Regeneron Pharmaceuticals. P.N. reports investigator-initiated grants from Amgen, Apple, AstraZeneca, Boston Scientific, and Novartis, personal fees from Apple, AstraZeneca, Blackstone Life Sciences, Foresite Labs, Novartis, Roche / Genentech, is a co-founder of TenSixteen Bio, is a shareholder of geneXwell and TenSixteen Bio, and spousal employment at Vertex, all unrelated to the present work. X. Lin is a consultant of AbbVie Pharmaceuticals and Verily Life Sciences. The remaining authors declare no competing interests.

Figures

Extended Data Fig. 1 |
Extended Data Fig. 1 |. Quantile-quantile plots for gene-centric unconditional meta-analysis of lipid traits LDL-C, HDL-C, TG and TC using TOPMed WGS data (n = 30,138).
MetaSTAAR-O is a two-sided test. Different symbols represent the MetaSTAAR-O P values of different functional categories of individual genes (putative loss-of-function, missense, synonymous, promoter and enhancer). The promoter and enhancer of a gene are the promoter and the GeneHancer region that overlap with CAGE sites for a given gene, respectively (Methods). Four lipid traits were analyzed using MetaSTAAR-O: LDL-C, low-density lipoprotein cholesterol; HDL-C, high-density lipoprotein cholesterol; TG, triglycerides; and TC, total cholesterol.
Extended Data Fig. 2 |
Extended Data Fig. 2 |. Manhattan plots for gene-centric unconditional meta-analysis of lipid traits LDL-C, HDL-C, TG and TC using TOPMed WGS data (n = 30,138).
The horizontal line indicates the genome-wide MetaSTAAR-O P value threshold of 5.00×107. The significant threshold is defined by multiple comparisons using the Bonferroni correction (0.05/(20,000×5)=5.00×107). MetaSTAAR-O is a two-sided test. Different symbols represent the MetaSTAAR-O P values of different functional categories of individual genes (putative loss-of-function, missense, synonymous, promoter and enhancer). The promoter and enhancer of a gene are the promoter and the GeneHancer region that overlap with CAGE sites for a given gene, respectively (Methods). Four lipid traits were analyzed using MetaSTAAR-O: LDL-C, low-density lipoprotein cholesterol; HDL-C, high-density lipoprotein cholesterol; TG, triglycerides; TC, total cholesterol.
Extended Data Fig. 3 |
Extended Data Fig. 3 |. Scatterplots comparing gene-centric unconditional meta-analysis P values from MetaSTAAR-O with STAAR-O from the joint analysis of pooled individual-level data (STAAR-O-Pooled) of lipid traits LDL-C, HDL-C, TG and TC using TOPMed WGS data (n = 30,138).
Each dot represents a functional category of a gene with x-axis label being the log10(P) of STAAR-O-Pooled and y-axis label being the log10(P) of MetaSTAAR-O (n = 30,138). The horizontal and vertical lines indicate the genome-wide P value threshold of 5.00×107. The significant threshold is defined by multiple comparisons using the Bonferroni correction (0.05/(20,000×5)=5.00×107). Both MetaSTAAR and STAAR are two-sided tests. LDL-C, low-density lipoprotein cholesterol; HDL-C, high-density lipoprotein cholesterol; TG, triglycerides; TC, total cholesterol.
Extended Data Fig. 4 |
Extended Data Fig. 4 |. Scatterplot of P values comparing MetaSTAAR-O to Burden-MS, SKAT-MS and ACAT-V-MS (MS is short for MetaSTAAR) for quantitative and dichotomous traits when 15% of rare variants are causal variants.
In each simulation replicate, a 2-kb region was randomly selected as the signal region. Within each signal region, variants were randomly generated to be causal based on a multiple logistic model and on average there were 15% causal variants in the signal region. The effect sizes of causal variants were βj=c0|log10MAFj|. For quantitative traits, c0=0.07; for dichotomous traits, c0=0.11. All causal variants had positive effect sizes. Power was estimated as the proportion of the P values less than α=107 based on 104 replicates. Burden-MS, SKAT-MS, ACAT-V-MS and MetaSTAAR-O are two-sided tests. Five studies were included in meta-analysis, each with a sample size of 10,000.
Figure 1 |
Figure 1 |. MetaSTAAR workflow.
a, Input data of MetaSTAAR for each study, including genotypes, phenotypes, covariates and sparse genetic relatedness matrix are prepared. b, Summary statistics, including individual variant score statistics, sparse weighted LD matrices and low-rank projection matrices accounting for covariate effects for each study are generated using MetaSTAARWorker. c, All rare variants in the merged variant list are functionally annotated and two types of variant sets are defined: gene-centric analysis by grouping variants into functional genomic elements for each protein-coding gene; and genetic region analysis using agnostic sliding windows. d, The MetaSTAAR-O P values for all variant sets defined in c are obtained. e, The conditional MetaSTAAR-O P values for all significant variant sets from d after adjusting for known variants are obtained and reported.

References

    1. Taliun D et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 590, 290–299 (2021). - PMC - PubMed
    1. Van Hout CV et al. Exome sequencing and characterization of 49,960 individuals in the UK Biobank. Nature 586, 749–756 (2020). - PMC - PubMed
    1. Szustakowski JD et al. Advancing human genetics research and drug discovery through exome sequencing of the UK Biobank. Nature Genetics 53, 942–948 (2021). - PubMed
    1. Hindy G et al. Rare coding variants in 35 genes associate with circulating lipid levels—A multi-ancestry analysis of 170,000 exomes. The American Journal of Human Genetics 109, 81–96 (2022). - PMC - PubMed
    1. Flannick J et al. Exome sequencing of 20,791 cases of type 2 diabetes and 24,440 controls. Nature 570, 71–76 (2019). - PMC - PubMed

Publication types

Grants and funding