Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Sep;52(9):969-983.
doi: 10.1038/s41588-020-0676-4. Epub 2020 Aug 24.

Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale

Xihao Li #  1 Zilin Li #  1 Hufeng Zhou  1 Sheila M Gaynor  1 Yaowu Liu  2 Han Chen  3   4 Ryan Sun  5 Rounak Dey  1 Donna K Arnett  6 Stella Aslibekyan  7 Christie M Ballantyne  8 Lawrence F Bielak  9 John Blangero  10 Eric Boerwinkle  3   11 Donald W Bowden  12 Jai G Broome  13 Matthew P Conomos  14 Adolfo Correa  15 L Adrienne Cupples  16   17 Joanne E Curran  10 Barry I Freedman  18 Xiuqing Guo  19 George Hindy  20 Marguerite R Irvin  7 Sharon L R Kardia  9 Sekar Kathiresan  21   22   23 Alyna T Khan  14 Charles L Kooperberg  24 Cathy C Laurie  14 X Shirley Liu  25   26 Michael C Mahaney  10 Ani W Manichaikul  27 Lisa W Martin  28 Rasika A Mathias  29 Stephen T McGarvey  30 Braxton D Mitchell  31   32 May E Montasser  33 Jill E Moore  34 Alanna C Morrison  3 Jeffrey R O'Connell  31 Nicholette D Palmer  12 Akhil Pampana  35   36 Juan M Peralta  10 Patricia A Peyser  9 Bruce M Psaty  37   38 Susan Redline  39   40   41 Kenneth M Rice  14 Stephen S Rich  27 Jennifer A Smith  9   42 Hemant K Tiwari  43 Michael Y Tsai  44 Ramachandran S Vasan  17   45 Fei Fei Wang  14 Daniel E Weeks  46 Zhiping Weng  34 James G Wilson  47   48 Lisa R Yanek  29 NHLBI Trans-Omics for Precision Medicine (TOPMed) ConsortiumTOPMed Lipids Working GroupBenjamin M Neale  35   49   50 Shamil R Sunyaev  35   51   52 Gonçalo R Abecasis  53   54 Jerome I Rotter  19 Cristen J Willer  55   56   57 Gina M Peloso  16 Pradeep Natarajan  23   35   36 Xihong Lin  58   59   60
Collaborators, Affiliations

Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale

Xihao Li et al. Nat Genet. 2020 Sep.

Abstract

Large-scale whole-genome sequencing studies have enabled the analysis of rare variants (RVs) associated with complex phenotypes. Commonly used RV association tests have limited scope to leverage variant functions. We propose STAAR (variant-set test for association using annotation information), a scalable and powerful RV association test method that effectively incorporates both variant categories and multiple complementary annotations using a dynamic weighting scheme. For the latter, we introduce 'annotation principal components', multidimensional summaries of in silico variant annotations. STAAR accounts for population structure and relatedness and is scalable for analyzing very large cohort and biobank whole-genome sequencing studies of continuous and dichotomous traits. We applied STAAR to identify RVs associated with four lipid traits in 12,316 discovery and 17,822 replication samples from the Trans-Omics for Precision Medicine Program. We discovered and replicated new RV associations, including disruptive missense RVs of NPC1L1 and an intergenic region near APOC1P1 associated with low-density lipoprotein cholesterol.

PubMed Disclaimer

Figures

Figure 1 |
Figure 1 |. STAAR workflow.
a, Prepare the input data of STAAR, including genotypes, phenotypes, covariates, and (sparse) genetic relatedness matrix. b, Annotate all variants in the genome and calculate the annotation principal components for different classes of variant function. c, Define two types of variant-sets: gene-centric analysis by grouping variants into functional genomic elements for each protein-coding gene; genetic region analysis using agnostic sliding windows. d, Estimate STAAR statistics for each variant-set. e, Obtain STAAR-O P-values for all variants sets that are defined in c and report significant findings.
Figure 2 |
Figure 2 |. Correlation heatmap of functional annotation scores.
The figure shows pairwise correlations between 76 individual and integrative functional annotations using variants from the pooled samples of lipid traits in the TOPMed data. The cells in the visualization are colored by Pearson’s correlation coefficient values with deeper colors indicating higher positive (red) or negative (blue) correlations. Each annotation principal component (aPC) is the first PC calculated from the set of individual functional annotations that measure similar biological function. These aPCs are then transformed into the PHRED-scaled scores for each variant across the genome (Online Methods).
Figure 3 |
Figure 3 |. Genetic region (2-kb sliding window) unconditional analysis results of LDL-C in discovery phase using the TOPMed cohort.
a, Manhattan plot showing the associations of 2.66 million 2-kb sliding windows for LDL-C versus log10(Pvalue) of STAAR-O. The horizontal line indicates a genome-wide P-value threshold of 1.88×108 (n = 12,316). b, Quantile-quantile plot of 2-kb sliding window STAAR-O P-values for LDL-C (n = 12,316). c, Genetic landscape of the windows significantly associated with LDL-C that are located in the 150-kb region on chromosome 19. Four statistical tests were compared: Burden, SKAT, ACAT-V and STAAR-O. A dot indicates that the sliding window at this location is significant using the statistical test that the color of the dot represents (n = 12,316). d, Scatterplot of P-values for the 2-kb sliding windows comparing STAAR-O with Burden, SKAT and ACAT-V tests. Each dot represents a sliding window with x-axis label being the log10(Pvalue) of the conventional test and y-axis label being the log10(Pvalue) of STAAR-O (n = 12,316).

References

    1. Bansal V, Libiger O, Torkamani A & Schork NJ Statistical analysis strategies for association studies involving rare variants. Nat. Rev. Genet 11, 773–785 (2010). - PMC - PubMed
    1. Kiezun A et al. Exome sequencing and the genetic basis of complex traits. Nat. Genet 44, 623–630 (2012). - PMC - PubMed
    1. Lee S, Abecasis GR, Boehnke M & Lin X Rare-variant association analysis: study designs and statistical tests. Am. J. Hum. Genet 95, 5–23 (2014). - PMC - PubMed
    1. Morgenthaler S & Thilly WG A strategy to discover genes that carry multi-allelic or mono-allelic risk for common diseases: a cohort allelic sums test (CAST). Mutation Research/Fundamental and Molecular Mechanisms of Mutagenesis 615, 28–56 (2007). - PubMed
    1. Li B & Leal SM Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am. J. Hum. Genet 83, 311–321 (2008). - PMC - PubMed

Methods-only references

    1. Yang J, Lee SH, Goddard ME & Visscher PM GCTA: a tool for genome-wide complex trait analysis. Am. J.Hum. Genet 88, 76–82 (2011). - PMC - PubMed
    1. Conomos MP, Reiner AP, Weir BS & Thornton TA Model-free estimation of recent genetic relatedness. Am. J. Hum. Genet 98, 127–148 (2016). - PMC - PubMed
    1. Dey R, Schmidt EM, Abecasis GR & Lee S A fast and accurate algorithm to test for binary phenotypes and its application to PheWAS. Am. J. Hum. Genet 101, 37–49 (2017). - PMC - PubMed
    1. Zhou W et al. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat. Genet 50, 1335–1341 (2018). - PMC - PubMed
    1. Karimzadeh M, Ernst C, Kundaje A & Hoffman MM Umap and Bismap: quantifying genome and methylome mappability. Nucleic Acids Res. 46, e120–e120 (2018). - PMC - PubMed

Publication types

Substances

Grants and funding