Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Aug 15;2(9):100168.
doi: 10.1016/j.xgen.2022.100168. eCollection 2022 Sep 14.

Systematic single-variant and gene-based association testing of thousands of phenotypes in 394,841 UK Biobank exomes

Affiliations

Systematic single-variant and gene-based association testing of thousands of phenotypes in 394,841 UK Biobank exomes

Konrad J Karczewski et al. Cell Genom. .

Abstract

Genome-wide association studies have successfully discovered thousands of common variants associated with human diseases and traits, but the landscape of rare variations in human disease has not been explored at scale. Exome-sequencing studies of population biobanks provide an opportunity to systematically evaluate the impact of rare coding variations across a wide range of phenotypes to discover genes and allelic series relevant to human health and disease. Here, we present results from systematic association analyses of 4,529 phenotypes using single-variant and gene tests of 394,841 individuals in the UK Biobank with exome-sequence data. We find that the discovery of genetic associations is tightly linked to frequency and is correlated with metrics of deleteriousness and natural selection. We highlight biological findings elucidated by these data and release the dataset as a public resource alongside the Genebass browser for rapidly exploring rare-variant association results.

Keywords: GWAS; PheWAS; biobanks; exome sequencing; rare variant association studies; rare variants.

PubMed Disclaimer

Conflict of interest statement

K.J.K. is a consultant for Vor Biopharma. B.M.R.-G., X.Z., F.R., S.E., A.J.G., M.R., J.W., H.J., and J.W.D. are employees of AbbVie, Inc. M.R. is an employee of and owns stock in AbbVie, Inc. E.A.T., D.S., P.G.B., and H.R. are employees of Biogen and hold stocks/stock options in Biogen. H.I.K., X.C., X.H., and M.R.M. are employees of Pfizer. D.K. holds stock in the private company TriNetX, LLC. D.S.P. was an employee of Genomics plc. All the analyses reported in this paper were performed as part of D.S.P.’s previous employment at the Massachusetts General Hospital and Broad Institute. N.A.W. owns stock in Pfizer. L.D.G. receives funding from Intel and Illumina. D.G.M. is a founder with equity of Goldfinch Bio and serves as a paid advisor to GSK, Variant Bio, Insitro, and Foresite Labs. H.L.R. is a member of the scientific advisory board at Genome Medical. A.A.P. is a Venture Partner at GV. He has received consulting fees from Novartis and receives funding from Bayer, IBM, Microsoft, Alphabet, Intel, GSK, Pfizer, and Illumina. M.J.D. is a founder of Maze Therapeutics. B.M.N. is a member of the scientific advisory board at Deep Genomics and RBNC Therapeutics, a member of the scientific advisory committee at Milken, and a consultant for Camp4 Therapeutics, Merck, and Biogen.

Figures

None
Graphical abstract
Figure 1
Figure 1
Quality control (QC) of rare-variant association tests (A–C) The number of phenotypes (A), variants (B), and groups (i.e., gene-annotation pairs; C) before and after QC. (D and E) After QC, the number of variants (D) and genes (E) are broken down by annotation and frequency bin (alternate allele frequency [AF] for variants, cumulative AF [CAF] for genes).
Figure 2
Figure 2
Rare-variant association testing is enhanced by group tests (A and B) For each ICD chapter, we show a Manhattan plot, depicting the distribution of p values for all single-variant (A) and SKAT-O gene-based (B) associations, where for each variant/gene, the minimum p value across phenotypes within each category is shown. (C and D) The number of gene-level associations per phenotype is shown as a bar plot, broken down by trait type (left) and normalized within each trait type (right), separated by phenotype category (C) or functional annotation (D). The single-variant tests are grouped into genes where at least one associated variant is necessary to be “significant by variant,” which is shown alongside group tests (“significant by gene”) as well as genes where an association is found both for group and single-variant tests.
Figure 3
Figure 3
The influence of variant AF and functional annotation in exome-association testing (A and B) The proportion of single variants (A) and genes (B) with at least one significant hit is shown broken down by AF (A) or CAF (B) category, each shown below the plot, broken down by functional annotation. (C and D) This metric is also plotted by the proportion expressed across transcripts for splice variants (C) and ClinVar pathogenicity status (D). Error bars represent 95% confidence intervals.
Figure 4
Figure 4
The effect of gene function on the landscape of rare-variant associations The proportion of gene-annotation pairs with at least one association (SKAT-O p < 2.5 × 10−7) is shown for a number of gene categories, each compared with a background set of genes matched on CAF. Error bars represent 95% confidence intervals. Asterisks denote a significant difference between the background and test sets (∗p < 0.05 and ∗∗p < 0.001, respectively).
Figure 5
Figure 5
Refined association between SCRIB and white-matter integrity of tapetum The Genebass browser provides views of the full dataset, including all quality control metrics and association statistics. (A) The summary of association information between pLoF variants in SCRIB with mean orientation dispersion (OD) index in tapetum on fractional anisotropy (FA) skeleton (from diffusion magnetic resonance imaging [dMRI] data). (B) A rare variant Manhattan plot of 8 rare pLoF variants is shown. (C) Details for the component variants are shown in a table, including their functional consequence (CSQ), a detailed protein-coding annotation (HGVSp), and the association p value and beta, as well as frequency information (AC, allele count; Hom, number of homozygotes; AN, allele number; AF, allele frequency). Each component pLoF variant in scrib has a positive beta value, and in aggregate, these variants show an association at p = 6 × 10−15 (A). (D) A Manhattan plot of a previous GWAS of FA averaged across brain regions (top), body of corpus callosum (middle), and splenium of corpus callosum (bottom). Horizontal dashed line indicates a GWAS genome-wide significance threshold (5 × 10−8), and vertical line indicates the location of SCRIB.

Comment in

References

    1. Kalia S.S., Adelman K., Bale S.J., Chung W.K., Eng C., Evans J.P., Herman G.E., Hufnagel S.B., Klein T.E., Korf B.R., et al. Recommendations for reporting of secondary findings in clinical exome and genome sequencing, 2016 update (ACMG SF v2.0. Genet. Med. 2017;19:249–255. - PubMed
    1. Bamshad M.J., Nickerson D.A., Chong J.X. Mendelian gene discovery: fast and furious with No end in sight. Am. J. Hum. Genet. 2019;105:448–455. - PMC - PubMed
    1. Abifadel M., Varret M., Rabès J.P., Allard D., Ouguerram K., Devillers M., Cruaud C., Benjannet S., Wickham L., Erlich D., et al. Mutations in PCSK9 cause autosomal dominant hypercholesterolemia. Nat. Genet. 2003;34:154–156. - PubMed
    1. Cohen J.C., Boerwinkle E., Mosley T.H., Jr., Hobbs H.H. Sequence variations in PCSK9, low LDL, and protection against coronary heart disease. N. Engl. J. Med. 2006;354:1264–1272. - PubMed
    1. Sabatine M.S., Giugliano R.P., Keech A.C., Honarpour N., Wiviott S.D., Murphy S.A., Kuder J.F., Wang H., Liu T., Wasserman S.M., et al. Evolocumab and clinical outcomes in patients with cardiovascular disease. N. Engl. N. Engl. J. Med. 2017;376:1713–1722. - PubMed