Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Sep;645(8081):692-701.
doi: 10.1038/s41586-025-09272-9. Epub 2025 Aug 6.

Whole-genome sequencing of 490,640 UK Biobank participants

Collaborators

Whole-genome sequencing of 490,640 UK Biobank participants

UK Biobank Whole-Genome Sequencing Consortium. Nature. 2025 Sep.

Abstract

Whole-genome sequencing provides an unbiased and complete view of the human genome and enables the discovery of genetic variation without the technical limitations of other genotyping technologies. Here we report on whole-genome sequencing of 490,640 UK Biobank participants, building on previous genotyping effort1. This advance deepens our understanding of how genetics associates with disease biology and further enhances the value of this open resource for the study of human biology and health. Coupling this dataset with rich phenotypic data, we surveyed within- and cross-ancestry genomic associations and identified novel genetic and clinical insights. Although most associations with disease traits were primarily observed in individuals of European ancestries, strong or novel signals were also identified in individuals of African and Asian ancestries. With the improved ability to accurately genotype structural variants and exonic variation in both coding and UTR sequences, we strengthened and revealed novel insights relative to whole-exome sequencing2,3 analyses. This dataset, representing a large collection of whole-genome sequencing data that is available to the UK Biobank research community, will enable advances of our understanding of the human genome, facilitate the discovery of diagnostics and therapeutics with higher efficacy and improved safety profile, and enable precision medicine strategies with the potential to improve global health.

PubMed Disclaimer

Conflict of interest statement

Competing interests: K.C., E. Wheeler, K. Kundu, F.H., Q.W., O.S.B., R.S.D., S.V.V.D., C.H., K. Lythgow, P.H.M., K.M., J. Mitchell, S.O., A.O’N., K.R.S., H.T., M.P., R.M., S. Wasilewski and S.P. are or have been employees or contractors of AstraZeneca during the time of this research and may own stock or stock options. J. Liu, Y.L., J. Sandhuria, T.G.R., L. Howe, C.R., D.L., P.A., M.P., D.S., Y.A., J.W., M.D., T.J., J. Davitte, E.I., R.S. and A. Cortes are or have been employees of GSK during the time of this research and may own stock or stock options. B.V.H., H.P.E., K.H.S.M., H. Hauswedell, O.E., A.S., N.G., S. Snorradottir, M.O.U., G.P., M.T.H., A.O., B.O.J., S.K., B.D.S., O.A.S., D. Beyter, G. Holley, V.T., A.G., P.I.O., F.Z., M.A., S.T.S., B. Sigurdsson, S.A.G., G.T.S., G.H.H., G.S., U.S., D.N.M., S.S., K. Kristinsson, E.S., G.T., F.J., P.M., I.J., T.R., H. Holm, H.S., J.S., D.F.G., O.T.M., G.M., U.T., A.H., H.J., P.S. and K.S. are or have been employees of Amgen deCODE genetics and may own stock or stock options. L. Hou, J. Molineros, Y.Z., A.H.L., E.H.B., E.M., A.D.T., G.A.-A., B.M., K.Y.H., J.X., S.N., A.K., S.X., B.F., T.M., T.H. and S.L. are employees and/or stockholders of Janssen Research & Development. R.M., Z.H. and O.S.-T. are employees and/or stockholders of Illumina. The other authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Variant call sets.
a, The density (counts) of the per-individual number of variants split up by the five populations considered in this study from the GraphTyper call sets. Panels show number of SNPs, indels, singleton SNPs and indels, combined number of SV insertions and duplications and SV deletions. b, The length of SV deletions discovered in this study, split by the frequency of the variant. Data are represented as box plots; the middle line represents the median, the lower and upper part of the red box plot correspond to the first and third quartiles, and the upper whisker extends from the 75th percentile to the 95th percentile. n indicates the number of SV deletions per frequency bin. c, The number of variants discovered split by variant class (duplication, insertion and deletion). d, The size of insertions and deletions discovered shown in range from 50 bp up to 1,000 bp, 10,000 bp and 100,000 bp.
Fig. 2
Fig. 2. UpSet plot of GWS associations across ancestries.
Ancestry labels are sorted by number of GWS associations in each set: meta-analysis (Meta), NFE, SAS, AFR, ASJ and EAS.
Fig. 3
Fig. 3. Observed number of genes in carriers of heterozygous pLoF, P or LP variants in WGS and WES.
The number of autosomal genes (y axis) with at least 1, 25, 50 and 100 heterozygous carriers among the number of individuals (x axis) to the total number of 452,728 participants with both WES and WGS data.
Fig. 4
Fig. 4. UTR-based collapsing analysis.
Miami plot of UTR-based rare-variant PheWAS associations for 687 binary (top) and 64 quantitative (bottom) phenotypes across all 6 collapsing models. Significant 5′, 3′ and 5′ and 3′ combined associations are represented in different colours. The top significant binary associations and the significant quantitative associations with P value < 1 × 10−30 are labelled. P values are unadjusted and are from Fisher’s exact two-sided tests (for binary traits) and linear regression (for quantitative traits).
Extended Data Fig. 1
Extended Data Fig. 1. Graphical summary: framework followed in this UK Biobank study.
Participant’s sample were collected from UK Biobank and underwent whole genome sequencing as described in the Supplementary Information. Sequencing data was analyzed with two distinct bioinformatic pipelines generating datasets GraphTyper and DRAGEN, both datasets, and the followed by variant calling of SNPs, indels, and structural variants (SV). Participants were identified to one of five ancestry groups for association analysis of genetic variants for a series of disease endpoints and quantitative traits. Cross-ancestry meta-analysis was then performed. The UK Biobank logo is reproduced with permission.
Extended Data Fig. 2
Extended Data Fig. 2. Effect of sample size on variant number.
Number of variants in UK Biobank DRAGEN aggregated variant dataset (release 2 PASS variants) in different allele frequency ranges as the number of samples increase from 1000 to 490,541 (based on random downsampling). Variant alleles are collected from all autosomes, sex chromosomes, mitochondria, and ALT contigs.
Extended Data Fig. 3
Extended Data Fig. 3. Regional plot for HBB-HBE1 locus associated with Hemolytic anemias (ICD10: D55-59) and Thalassaemia (ICD10: D56) in NFE, AFR, SAS populations.
NFE: non-Finnish European; AFR: African; SAS: South Asian; EAS: East Asian. GWAS for D56 was not conducted in the AFR population due to a sample size of fewer than 200 cases; therefore, no locuszoom plot is available for D56 in AFR. rs11549407 (no LD estimation for this rare variant) MAF: 0.005% in NFE, 0 in SAS, 0.003% in AFR; rs33915217 MAF: 0.00008% in NFE, 0.41% in SAS, 0.004% in AFR; rs334: 0.004% in NFE, 0.089% in SAS, 6.26% in AFR. P-values are uncorrected and are from two-sided tests performed with approximate Firth logistic regression.
Extended Data Fig. 4
Extended Data Fig. 4. The change in Phred scores (−10*log10[p-values]) between the WGS and WES analyses for 12,963,003 binary genotype-phenotype associations (filled circle) and 1,167,322 quantitative associations (empty circle) stratified by chapter.
For gene–phenotype associations that appear in multiple collapsing models, we display only those with the lowest P value within each dataset. The green circles indicate associations that were not significant in the WES analysis but were significant in the WGS analysis. The orange dots represent associations that were originally significant in the WES analysis but became not significant in the WGS analysis. The y axis is capped at ΔPhred = 60 (and −60), equivalent to a P value change of 0.000001.

References

    1. Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature562, 203–209 (2018). - PMC - PubMed
    1. Szustakowski, J. D. et al. Advancing human genetics research and drug discovery through exome sequencing of the UK Biobank. Nat. Genet.53, 942–948 (2021). - PubMed
    1. Van Hout, C. V. et al. Exome sequencing and characterization of 49,960 individuals in the UK Biobank. Nature586, 749–756 (2020).
    1. Miller, K. L. et al. Multimodal population brain imaging in the UK Biobank prospective epidemiological study. Nat. Neurosci.19, 1523–1536 (2016). - PMC - PubMed
    1. Sun, B. B. et al. Plasma proteomic associations with genetics and health in the UK Biobank. Nature622, 329–338 (2023). - PMC - PubMed

LinkOut - more resources