Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Sep 8:2024.09.06.611689.
doi: 10.1101/2024.09.06.611689.

A blended genome and exome sequencing method captures genetic variation in an unbiased, high-quality, and cost-effective manner

Toni A Boltz  1   2 Benjamin B Chu  3 Calwing Liao  1   2   4 Julia M Sealock  1   2   4 Robert Ye  1   2   4 Lerato Majara  1   2   5 Jack M Fu  4   6 Susan Service  7 Lingyu Zhan  8   9 Sarah E Medland  10 Sinéad B Chapman  1   2   4   11 Simone Rubinacci  4   12   13 Matthew DeFelice  14 Jonna L Grimsby  14 Tamrat Abebe  15 Melkam Alemayehu  3 Fred K Ashaba  16 Elizabeth G Atkinson  17   18 Tim Bigdeli  19 Amanda B Bradway  14 Harrison Brand  4   20 Lori B Chibnik  1   21   22 Abebaw Fekadu  3   23 Michael Gatzen  14 Bizu Gelaye  1   22   24 Stella Gichuru  25 Marissa L Gildea  14 Toni C Hill  14 Hailiang Huang  1   2   4   11 Kalyn M Hubbard  14 Wilfred E Injera  26 Roxanne James  27 Moses Joloba  28 Christopher Kachulis  14 Phillip R Kalmbach  14 Rogers Kamulegeya  28 Gabriel Kigen  29 Soyeon Kim  1   2   4 Nastassja Koen  27   30 Edith K Kwobah  25 Joseph Kyebuzibwa  31 Seungmo Lee  32 Niall J Lennon  14 Penelope A Lind  10 Esteban A Lopera-Maya  7   33 Johnstone Makale  34 Serghei Mangul  35   36 Justin McMahon  1   22 Pierre Mowlem  37 Henry Musinguzi  16 Rehema M Mwema  38 Noeline Nakasujja  31 Carter P Newman  1   22 Lethukuthula L Nkambule  1   2 Conor R O'Neil  14 Ana Maria Olivares  1 Catherine M Olsen  10 Linnet Ongeri  39 Sophie J Parsa  1   2 Adele Pretorius  27 Raj Ramesar  30 Faye L Reagan  14 Chiara Sabatti  40 Jacquelyn A Schneider  14 Welelta Shiferaw  3 Anne Stevenson  1   22   41 Erik Stricker  17 Rocky E Stroud 2nd  1   22 Jessie Tang  14 David Whiteman  10 Mary T Yohannes  1   2 Mingrui Yu  1   2   4 Kai Yuan  1   2   4 NeuroGAP-PsychosisDickens Akena  31 Lukoye Atwoli  42   43   44 Symon M Kariuki  45   46 Karestan C Koenen  1   22   47   48 Charles R J C Newton  38   46 Dan J Stein  30 Solomon Teferra  3 Zukiswa Zingela  49 Carlos N Pato  50   11 Michele T Pato  50   11 Carlos Lopez-Jaramillo  51 Nelson Freimer  11   7   33 Roel A Ophoff  7   33 Loes M Olde Loohuis  7   33   52 Michael E Talkowski  1   4   20 Benjamin M Neale  1   2   4   11 Daniel P Howrigan  1   2   4 Alicia R Martin  1   2   4   11
Affiliations

A blended genome and exome sequencing method captures genetic variation in an unbiased, high-quality, and cost-effective manner

Toni A Boltz et al. bioRxiv. .

Abstract

We deployed the Blended Genome Exome (BGE), a DNA library blending approach that generates low pass whole genome (1-4× mean depth) and deep whole exome (30-40× mean depth) data in a single sequencing run. This technology is cost-effective, empowers most genomic discoveries possible with deep whole genome sequencing, and provides an unbiased method to capture the diversity of common SNP variation across the globe. To evaluate this new technology at scale, we applied BGE to sequence >53,000 samples from the Populations Underrepresented in Mental Illness Associations Studies (PUMAS) Project, which included participants across African, African American, and Latin American populations. We evaluated the accuracy of BGE imputed genotypes against raw genotype calls from the Illumina Global Screening Array. All PUMAS cohorts had R 2 concordance ≥95% among SNPs with MAF≥1%, and never fell below ≥90% R 2 for SNPs with MAF<1%. Furthermore, concordance rates among local ancestries within two recently admixed cohorts were consistent among SNPs with MAF≥1%, with only minor deviations in SNPs with MAF<1%. We also benchmarked the discovery capacity of BGE to access protein-coding copy number variants (CNVs) against deep whole genome data, finding that deletions and duplications spanning at least 3 exons had a positive predicted value of ~90%. Our results demonstrate BGE scalability and efficacy in capturing SNPs, indels, and CNVs in the human genome at 28% of the cost of deep whole-genome sequencing. BGE is poised to enhance access to genomic testing and empower genomic discoveries, particularly in underrepresented populations.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.. Expected ancestral diversity, coverage, and quality from BGE data at scale.
A) Principal components (PC) 1 vs PC2 and B) PC3 vs PC4, C) Fraction of exome target covered with at least 10x depth stratified by cohort and collection method. Solid lines indicate saliva collection (NeuroGAP) and dashed lines indicate blood collection (GPC and Paisa). D) Estimated mean WGS coverage, mean coding depth, mean coding call rate and mean coding genotype quality.
Figure 2.
Figure 2.. Protein-coding copy number variants have expected qualities with BGE compared to WGS.
A) Recall and positive predictive value (PPV) of CNVs called from the BGE relative to matched WGS samples (N=400). B) Distribution of deletions and duplications across all cohorts. C) Distribution of CNV sizes across cohorts by number of exons. D) Proportion of unique deletion and duplication carriers across cohorts. E) Comparison of CNV size across cohorts by number of exons for saliva and blood. F) Comparison of unique deletion and duplication carriers between blood and saliva.
Figure 3:
Figure 3:. Imputation of BGE data is highly concordant with GWAS array data across MAF bins.
The sizes of points correspond to numbers of SNPs in each MAF bin. Variants are filtered to those passing an INFO score >= 0.8. SNP MAFs are defined within cohorts using the GSA array for the Paisa and GPC cohorts. Due to limited GSA samples in the NeuroGAP cohorts, MAFs are defined using the HGDP+1kGP AFR subset.
Figure 4.
Figure 4.. Imputation accuracy differentiates by local ancestry background at low allele frequencies.
A) Aggregate R2 in the Paisa cohort. B) Aggregate R2 in the GPC cohort. Note # genotypes legend stands for number of genotypes, with one value for each sample and each SNP. Heterozygous ancestry results with non-reference concordance measurements are in Supplementary Figure 5.

References

    1. Visscher P. M. et al. 10 Years of GWAS Discovery: Biology, Function, and Translation. Am. J. Hum. Genet. 101, 5–22 (2017). - PMC - PubMed
    1. Pasaniuc B. et al. Extremely low-coverage sequencing and imputation increases power for genome-wide association studies. Nat. Genet. 44, 631–635 (2012). - PMC - PubMed
    1. Martin A. R. et al. Low-coverage sequencing cost-effectively detects known and novel variation in underrepresented populations. Am. J. Hum. Genet. 108, 656–668 (2021). - PMC - PubMed
    1. Plenge R. M., Scolnick E. M. & Altshuler D. Validating therapeutic targets through human genetics. Nat. Rev. Drug Discov. 12, 581–594 (2013). - PubMed
    1. Minikel E. V. et al. Evaluating drug targets through human loss-of-function genetic variation. Nature 581, 459–464 (2020). - PMC - PubMed

Publication types

LinkOut - more resources