Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Oct;586(7831):749-756.
doi: 10.1038/s41586-020-2853-0. Epub 2020 Oct 21.

Exome sequencing and characterization of 49,960 individuals in the UK Biobank

Affiliations

Exome sequencing and characterization of 49,960 individuals in the UK Biobank

Cristopher V Van Hout et al. Nature. 2020 Oct.

Abstract

The UK Biobank is a prospective study of 502,543 individuals, combining extensive phenotypic and genotypic data with streamlined access for researchers around the world1. Here we describe the release of exome-sequence data for the first 49,960 study participants, revealing approximately 4 million coding variants (of which around 98.6% have a frequency of less than 1%). The data include 198,269 autosomal predicted loss-of-function (LOF) variants, a more than 14-fold increase compared to the imputed sequence. Nearly all genes (more than 97%) had at least one carrier with a LOF variant, and most genes (more than 69%) had at least ten carriers with a LOF variant. We illustrate the power of characterizing LOF variants in this population through association analyses across 1,730 phenotypes. In addition to replicating established associations, we found novel LOF variants with large effects on disease traits, including PIEZO1 on varicose veins, COL6A1 on corneal resistance, MEPE on bone density, and IQGAP2 and GMPR on blood cell traits. We further demonstrate the value of exome sequencing by surveying the prevalence of pathogenic variants of clinical importance, and show that 2% of this population has a medically actionable variant. Furthermore, we characterize the penetrance of cancer in carriers of pathogenic BRCA1 and BRCA2 variants. Exome sequences from the first 49,960 participants highlight the promise of genome sequencing in large population-based studies and are now accessible to the scientific community.

PubMed Disclaimer

Conflict of interest statement

C.V.V.H., J.D.B., D.L., C.G.-J., S.K., B.Y., N.B., A.H.L., C.O., A.M., J.S., C.S., A.H., E.M., L.B., A.L., X.B., S.O., J.P., L.H., A.L.B., A.Y., K.P., M.J., W.J.S., G.D.Y., A.E., G.C., A.R.S., S.B., M.C., J.G.R., J.M., J.D.O., G.R.A., A.B. and the spouse of C.J.W. are current or former employees and/or stockholders of Regeneron Genetics Center or Regeneron Pharmaceuticals. J.P. is a current employee of DNANexus and C.S. of Hasso Plattner Institute, but work was conducted while employed by the Regeneron Genetics Center. I.T., J.D.H., A.K.P., L.C., M.R.N., J.W., R.A.S. and L.Y.-A. are current or former employees and/or stockholders of GlaxoSmithKline. I.T. is a current employee of AstraZenica, J.D.H. of Foresite Labs, L.C. of BioMarin and M.R.N. of Deerfield, but work was conducted while employed by GlaxoSmithKline. The other authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Predicted number of genes in carriers of heterozygous LOF variants in around 500,000 whole-exome sequences from existing WES data.
The number of autosomal genes with at least 1, 5, 10, and so on, carriers of heterozygous LOF variants that passed Goldilocks quality control (see Supplementary Methods), had genotype missingness of <10% and Hardy–Weinberg equilibrium P > 10−15 increases with sample size. UKB participants of European ancestry with WES data (n = 46,911) were downsampled at random to the number of individuals specified on the x axis. The number of genes containing at least the indicated count of carriers of heterozygous LOF variants with MAF < 1% as indicated in the legend are plotted on the y axis. The number of autosomal genes is 18,574 in this gene model. The blue dashed line indicates the predicted number of genes (18,273) with at least 1 carrier of a heterozygous LOF variant in 500,000 exomes. Solid curves connect the observed number of genes; dashed curves connect predicted counts from a β-binomial mixture model (see Supplementary Methods).

Comment in

Similar articles

Cited by

References

    1. Sudlow C, et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 2015;12:e1001779. - PMC - PubMed
    1. Bycroft C, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562:203–209. - PMC - PubMed
    1. Tyrrell J, et al. Height, body mass index, and socioeconomic status: Mendelian randomisation study in UK Biobank. Br. Med. J. 2016;352:i582. - PMC - PubMed
    1. Lyall DM, et al. Association of body mass index with cardiometabolic disease in the UK Biobank: a Mendelian randomization study. JAMA Cardiol. 2017;2:882–889. - PMC - PubMed
    1. Abul-Husn NS, et al. A protein-truncating HSD17B13 variant and protection from chronic liver disease. N. Engl. J. Med. 2018;378:1096–1106. - PMC - PubMed

Publication types

MeSH terms