Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale

Xihao Li^#¹, Zilin Li^#¹, Hufeng Zhou¹, Sheila M Gaynor¹, Yaowu Liu², Han Chen^{3

4}, Ryan Sun⁵, Rounak Dey¹, Donna K Arnett⁶, Stella Aslibekyan⁷, Christie M Ballantyne⁸, Lawrence F Bielak⁹, John Blangero¹⁰, Eric Boerwinkle^{3

11}, Donald W Bowden¹², Jai G Broome¹³, Matthew P Conomos¹⁴, Adolfo Correa¹⁵, L Adrienne Cupples^{16

17}, Joanne E Curran¹⁰, Barry I Freedman¹⁸, Xiuqing Guo¹⁹, George Hindy²⁰, Marguerite R Irvin⁷, Sharon L R Kardia⁹, Sekar Kathiresan^{21

22

23}, Alyna T Khan¹⁴, Charles L Kooperberg²⁴, Cathy C Laurie¹⁴, X Shirley Liu^{25

26}, Michael C Mahaney¹⁰, Ani W Manichaikul²⁷, Lisa W Martin²⁸, Rasika A Mathias²⁹, Stephen T McGarvey³⁰, Braxton D Mitchell^{31

32}, May E Montasser³³, Jill E Moore³⁴, Alanna C Morrison³, Jeffrey R O'Connell³¹, Nicholette D Palmer¹², Akhil Pampana^{35

36}, Juan M Peralta¹⁰, Patricia A Peyser⁹, Bruce M Psaty^{37

38}, Susan Redline^{39

40

41}, Kenneth M Rice¹⁴, Stephen S Rich²⁷, Jennifer A Smith^{9

42}, Hemant K Tiwari⁴³, Michael Y Tsai⁴⁴, Ramachandran S Vasan^{17

45}, Fei Fei Wang¹⁴, Daniel E Weeks⁴⁶, Zhiping Weng³⁴, James G Wilson^{47

48}, Lisa R Yanek²⁹; NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium; TOPMed Lipids Working Group; Benjamin M Neale^{35

49

50}, Shamil R Sunyaev^{35

51

52}, Gonçalo R Abecasis^{53

54}, Jerome I Rotter¹⁹, Cristen J Willer^{55

56

57}, Gina M Peloso¹⁶, Pradeep Natarajan^{23

35

36}, Xihong Lin^{58

59

60}

Collaborators, Affiliations

PMID: 32839606
PMCID: PMC7483769
DOI: 10.1038/s41588-020-0676-4

Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale

Xihao Li et al. Nat Genet. 2020 Sep.

. 2020 Sep;52(9):969-983.

doi: 10.1038/s41588-020-0676-4. Epub 2020 Aug 24.

PMID: 32839606
PMCID: PMC7483769
DOI: 10.1038/s41588-020-0676-4

Abstract

Large-scale whole-genome sequencing studies have enabled the analysis of rare variants (RVs) associated with complex phenotypes. Commonly used RV association tests have limited scope to leverage variant functions. We propose STAAR (variant-set test for association using annotation information), a scalable and powerful RV association test method that effectively incorporates both variant categories and multiple complementary annotations using a dynamic weighting scheme. For the latter, we introduce 'annotation principal components', multidimensional summaries of in silico variant annotations. STAAR accounts for population structure and relatedness and is scalable for analyzing very large cohort and biobank whole-genome sequencing studies of continuous and dichotomous traits. We applied STAAR to identify RVs associated with four lipid traits in 12,316 discovery and 17,822 replication samples from the Trans-Omics for Precision Medicine Program. We discovered and replicated new RV associations, including disruptive missense RVs of NPC1L1 and an intergenic region near APOC1P1 associated with low-density lipoprotein cholesterol.

PubMed Disclaimer

Figures

**Figure 1 |. STAAR workflow.**
a, Prepare the input data of STAAR, including genotypes, phenotypes, covariates, and (sparse) genetic relatedness matrix. b, Annotate all variants in the genome and calculate the annotation principal components for different classes of variant function. c, Define two types of variant-sets: gene-centric analysis by grouping variants into functional genomic elements for each protein-coding gene; genetic region analysis using agnostic sliding windows. d, Estimate STAAR statistics for each variant-set. e, Obtain STAAR-O P-values for all variants sets that are defined in c and report significant findings.

**Figure 2 |. Correlation heatmap of functional annotation scores.**
The figure shows pairwise correlations between 76 individual and integrative functional annotations using variants from the pooled samples of lipid traits in the TOPMed data. The cells in the visualization are colored by Pearson’s correlation coefficient values with deeper colors indicating higher positive (red) or negative (blue) correlations. Each annotation principal component (aPC) is the first PC calculated from the set of individual functional annotations that measure similar biological function. These aPCs are then transformed into the PHRED-scaled scores for each variant across the genome (Online Methods).

**Figure 3 |. Genetic region (2-kb sliding window) unconditional analysis results of LDL-C in discovery phase using the TOPMed cohort.**
a, Manhattan plot showing the associations of 2.66 million 2-kb sliding windows for LDL-C versus $- \log_{10} (P v a l u e)$ of STAAR-O. The horizontal line indicates a genome-wide P-value threshold of $1.88 \times 10^{- 8}$ (n = 12,316). b, Quantile-quantile plot of 2-kb sliding window STAAR-O P-values for LDL-C (n = 12,316). c, Genetic landscape of the windows significantly associated with LDL-C that are located in the 150-kb region on chromosome 19. Four statistical tests were compared: Burden, SKAT, ACAT-V and STAAR-O. A dot indicates that the sliding window at this location is significant using the statistical test that the color of the dot represents (n = 12,316). d, Scatterplot of P-values for the 2-kb sliding windows comparing STAAR-O with Burden, SKAT and ACAT-V tests. Each dot represents a sliding window with x-axis label being the $- \log_{10} (P v a l u e)$ of the conventional test and y-axis label being the $- \log_{10} (P v a l u e)$ of STAAR-O (n = 12,316).

See this image and copyright information in PMC

References

1. Bansal V, Libiger O, Torkamani A & Schork NJ Statistical analysis strategies for association studies involving rare variants. Nat. Rev. Genet 11, 773–785 (2010). - PMC - PubMed
1. Kiezun A et al. Exome sequencing and the genetic basis of complex traits. Nat. Genet 44, 623–630 (2012). - PMC - PubMed
1. Lee S, Abecasis GR, Boehnke M & Lin X Rare-variant association analysis: study designs and statistical tests. Am. J. Hum. Genet 95, 5–23 (2014). - PMC - PubMed
1. Morgenthaler S & Thilly WG A strategy to discover genes that carry multi-allelic or mono-allelic risk for common diseases: a cohort allelic sums test (CAST). Mutation Research/Fundamental and Molecular Mechanisms of Mutagenesis 615, 28–56 (2007). - PubMed
1. Li B & Leal SM Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am. J. Hum. Genet 83, 311–321 (2008). - PMC - PubMed

Methods-only references

1. Yang J, Lee SH, Goddard ME & Visscher PM GCTA: a tool for genome-wide complex trait analysis. Am. J.Hum. Genet 88, 76–82 (2011). - PMC - PubMed
1. Conomos MP, Reiner AP, Weir BS & Thornton TA Model-free estimation of recent genetic relatedness. Am. J. Hum. Genet 98, 127–148 (2016). - PMC - PubMed
1. Dey R, Schmidt EM, Abecasis GR & Lee S A fast and accurate algorithm to test for binary phenotypes and its application to PheWAS. Am. J. Hum. Genet 101, 37–49 (2017). - PMC - PubMed
1. Zhou W et al. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat. Genet 50, 1335–1341 (2018). - PMC - PubMed
1. Karimzadeh M, Ernst C, Kundaje A & Hoffman MM Umap and Bismap: quantifying genome and methylome mappability. Nucleic Acids Res. 46, e120–e120 (2018). - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- H1 Connect - Access expert opinions and insights on biomedical research.

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale

Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale

Abstract

Figures

References

Methods-only references

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources