Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Meta-Analysis
. 2020 Jan 28;11(1):542.
doi: 10.1038/s41467-020-14288-y.

Genome-wide rare variant analysis for thousands of phenotypes in over 70,000 exomes from two cohorts

Affiliations
Meta-Analysis

Genome-wide rare variant analysis for thousands of phenotypes in over 70,000 exomes from two cohorts

Elizabeth T Cirulli et al. Nat Commun. .

Abstract

Understanding the impact of rare variants is essential to understanding human health. We analyze rare (MAF < 0.1%) variants against 4264 phenotypes in 49,960 exome-sequenced individuals from the UK Biobank and 1934 phenotypes (1821 overlapping with UK Biobank) in 21,866 members of the Healthy Nevada Project (HNP) cohort who underwent Exome + sequencing at Helix. After using our rare-variant-tailored methodology to reduce test statistic inflation, we identify 64 statistically significant gene-based associations in our meta-analysis of the two cohorts and 37 for phenotypes available in only one cohort. Singletons make significant contributions to our results, and the vast majority of the associations could not have been identified with a genotyping chip. Our results are available for interactive browsing in a webapp (https://ukb.research.helix.com). This comprehensive analysis illustrates the biological value of large, deeply phenotyped cohorts of unselected populations coupled with NGS data.

PubMed Disclaimer

Conflict of interest statement

E.T.C., S.W., N.L.W., F.T., D.M.F., E.S., M.I. and J.T.L. are employees of Helix. R.W.R., G.E., W.J.M., K.A.S. and J.J.G. declare no competing interests.

Figures

Fig. 1
Fig. 1. Gene-based collapsing analysis.
a First, variants in each gene are identified by sequencing. b Variants that are predicted to be damaging—those that are rare and annotated as likely to affect the functionality of the gene, such as coding variants—are then selected for analysis. c Finally, the number of cases with a qualifying variant in each gene is compared with the number of controls with a qualifying variant, producing one statistical result per gene instead of one per variant.
Fig. 2
Fig. 2. Histogram of number of qualifying variants per gene in European UKB cohort.
a Number of qualifying coding variants per gene. Eleven genes with >500 variants were excluded from plot. The median of variants per gene is 34 (range [1:2833]). b Number of qualifying coding variants per coding nucleotide of each gene. Sixteen genes with values >0.2 were excluded from the plot. The median of variants per nucleotide is 0.027 (range [0.0001:0.991]). c Number of qualifying loss of function (LoF) variants per gene. Six genes with >50 variants were excluded from plot. The median of variants per gene is six (range [1:178]). d Number of qualifying LoF variants per coding nucleotide of each gene. Nine genes with values >0.05 were excluded from the plot. The median of variants per nucleotide is 0.005 (range [9.5 × 10−5:0.25]). Plots for all ancestries and HNP cohort can be found in Supplementary Fig. 1.
Fig. 3
Fig. 3. Overlaid QQ plots for the coding model with the phenotype atrial fibrillation.
This phenotype has a 1:22 case:control ratio. Shown are the results for a linear mixed model (LMM) meta-analysis of all European ancestry individuals with no minimum number of variant carriers required (black), with at least ten case carriers observed (red), and with at least ten case carriers expected in the case group based on the overall frequency (cyan), as well as a Fisher’s exact test (FET) of unrelated European ancestry individuals and all genes included (blue). The second to last condition is the requirement we set for our main analysis results. The one significant association is TTN, known from previous studies to be involved in phenotypes related to atrial fibrillation. This association is significant (meta-analysis p < 3.4 × 10−10) in the LMM analysis, but it is difficult to distinguish from test statistic inflation without using the 10 expected case carriers cutoff (cyan). There is no inflation in the Fisher’s exact test of unrelated individuals, but this association is not significant in that analysis.
Fig. 4
Fig. 4. Distribution of effects of rare variants in select genes in the UKB cohort.
a SLC2A9 protein and urate levels. The legend shows the gene, its associated phenotype, and the effect size (beta). The effect size is computed from the gene-based collapsing model, in which individuals were coded as either having or not having a qualifying variant. A positive value indicates that variant carriers have, on average, higher values for the phenotype, while a negative value indicates that variant carriers have lower values. The amino acid positions are shown on the x-axis, with the PFAM domain highlighted. The y-axis displays the beta of each individual variant, with negative values shown below and positive values above the horizontal axis. Variants are indicated according to their consequence as shown and labelled according to their amino acid change or splice site variation. The number inside the circle is the number of people carrying that variant. Darker lines connecting the variants to the gene and darker-filled shapes indicate more significant p values for the association. b Membrane topology plot of SLC2A9 showing variants with positive effect size (green) on urate levels and variants with negative effect size (pink). SLC2A9 (Glut9) reabsorbs urate in the proximal tubules of the kidneys. Variants that disrupt the transmembrane regions or lower gene expression are known to be associated with hypouricemia. Here, 88% of the variants with negative betas, associated with lowered urate levels, are in or directly adjacent to a predicted transmembrane region, as opposed to only 55% of the variants with positive effect size. c GFI1B protein and mean platelet volume. Consistent with the literature, variants in the zinc finger domains are associated with increased platelet volumes, but we make the observation that some variants in between zinc fingers 3 and 4 may be having an effect in the opposite direction,. d ASGR1 protein and alkaline phosphatase levels. In addition to the known effects of LoF variants, we show that missense variants are also playing a role. Plots of the other significantly associated genes are included in Supplementary Fig. 3.

References

    1. Richardson, T. G., Harrison, S., Hemani, G. & Davey Smith, G. An atlas of polygenic risk score associations to highlight putative causal relationships across the human phenome. Elife8, e43657 (2019). - PMC - PubMed
    1. Khera AV, et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet. 2018;50:1219–1224. doi: 10.1038/s41588-018-0183-z. - DOI - PMC - PubMed
    1. Krapohl E, et al. Phenome-wide analysis of genome-wide polygenic scores. Mol. Psychiatry. 2016;21:1188–1193. doi: 10.1038/mp.2015.126. - DOI - PMC - PubMed
    1. Long T, et al. Whole-genome sequencing identifies common-to-rare variants associated with human blood metabolites. Nat. Genet. 2017;49:568–578. doi: 10.1038/ng.3809. - DOI - PubMed
    1. Zhu Q, et al. A genome-wide comparison of the functional properties of rare and common genetic variants in humans. Am. J. Hum. Genet. 2011;88:458–468. doi: 10.1016/j.ajhg.2011.03.008. - DOI - PMC - PubMed

Publication types