Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2023 Jan 25:2023.01.25.525428.
doi: 10.1101/2023.01.25.525428.

Structural variation across 138,134 samples in the TOPMed consortium

Goo Jun  1 Adam C English  2 Ginger A Metcalf  2 Jianzhi Yang  3 Mark Jp Chaisson  3 Nathan Pankratz  4 Vipin K Menon  2 William J Salerno  5 Olga Krasheninina  5 Albert V Smith  6 John A Lane  6 Tom Blackwell  6 Hyun Min Kang  6 Sejal Salvi  2 Qingchang Meng  2 Hua Shen  2 Divya Pasham  2 Sravya Bhamidipati  2 Kavya Kottapalli  2 Donna K Arnett  7 Allison Ashley-Koch  8   9 Paul L Auer  10 Kathleen M Beutel  4 Joshua C Bis  11 John Blangero  12 Donald W Bowden  13 Jennifer A Brody  11 Brian E Cade  14 Yii-Der Ida Chen  15   16 Michael H Cho  17 Joanne E Curran  13 Myriam Fornage  18 Barry I Freedman  19 Tasha Fingerlin  20 Bruce D Gelb  21 Lifang Hou  22 Yi-Jen Hung  23 John P Kane  24 Robert Kaplan  25 Wonji Kim  26 Ruth J F Loos  27 Gregory M Marcus  28 Rasika A Mathias  29 Stephen T McGarvey  30 Courtney Montgomery  31 Take Naseri  32 S Mehdi Nouraie  33 Michael H Preuss  27 Nicholette D Palmer  13 Patricia A Peyser  34 Laura M Raffield  35 Aakrosh Ratan  36 Susan Redline  14 Sefuiva Reupena  37 Jerome I Rotter  16   38 Stephen S Rich  34 Michiel Rienstra  38 Ingo Ruczinski  39 Vijay G Sankaran  40   41 David A Schwartz  42 Christine E Seidman  43   44   45 Jonathan G Seidman  43 Edwin K Silverman  46 Jennifer A Smith  33 Adrienne Stilp  47 Kent D Taylor  16   36 Marilyn J Telen  8 Scott T Weiss  26 L Keoki Williams  48 Baojun Wu  48 Lisa R Yanek  27 Yingze Zhang  33 Jessica Lasky-Su  26 Marie Claude Gingras  2 Susan K Dutcher  49 Evan E Eichler  50   51 Stacey Gabriel  41 Soren Germer  52 Ryan Kim  53 Karine A Viaud-Martinez  54 Deborah A Nickerson  55 NHLBI Trans-Omics for Precision Medicine (TOPMed) ConsortiumJames Luo  56 Alex Reiner  57 Richard A Gibbs  2 Eric Boerwinkle  1   2 Goncalo Abecasis  5   6 Fritz J Sedlazeck  2   58
Affiliations

Structural variation across 138,134 samples in the TOPMed consortium

Goo Jun et al. bioRxiv. .

Abstract

Ever larger Structural Variant (SV) catalogs highlighting the diversity within and between populations help researchers better understand the links between SVs and disease. The identification of SVs from DNA sequence data is non-trivial and requires a balance between comprehensiveness and precision. Here we present a catalog of 355,667 SVs (59.34% novel) across autosomes and the X chromosome (50bp+) from 138,134 individuals in the diverse TOPMed consortium. We describe our methodologies for SV inference resulting in high variant quality and >90% allele concordance compared to long-read de-novo assemblies of well-characterized control samples. We demonstrate utility through significant associations between SVs and important various cardio-metabolic and hemotologic traits. We have identified 690 SV hotspots and deserts and those that potentially impact the regulation of medically relevant genes. This catalog characterizes SVs across multiple populations and will serve as a valuable tool to understand the impact of SV on disease development and progression.

Keywords: NGS; Population; Structural Variants; TOPMed.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Overview of SV calls. A) Sample counts based on genetically inferred ancestry showing the majority of individuals are Europeans (EUR) followed by African (AFR), East Asian and Samoan (EAS), American (AMR), South Asian (SAS) and Mestizo (MES) ancestry. B) Per-sample SV count distributions by ancestry. C) Overview of gene density (red), deletions (blue), duplications (orange) and inversions (green). D) Size distribution of population genotyped CNV and inversions. The majority of SVs across the population are large events. E) Randomized PCA principal components 1 and 2 of deletions. See supplementary Figures 2-4 for deletions , duplications and inversions.
Figure 2:
Figure 2:
Evaluation of SV call sets against using haplotype-resolved assemblies for deletion (DEL) and duplication (DUP) calls in the NA12878 and NA19238 genomes. A) Evaluation using TT-Mars. The fraction of calls that may be assessed using the assemblies (analyzed) and positive predictive value (PPV) are given for DEL and DUP calls. B) Support for calls from both the TT-Mars and Truvari methods. C) The size by count spectrum of all calls (red), the count validated by TT-Mars (green), and the count validated by Truvari (blue) for the combination of both genomes.
Figure 3:
Figure 3:
A) Assessment of Mendelian error and novel HET rates per SV type across 11,387 samples from trio/duo families. For the evaluation, we also include the multiallelic variants that do increase the error rate, especially across duplications. B) Overlap of SV over 1KGP and gnomAD SV with respect to the allele frequency within the TOPMed SV call set. The allele frequencies change slightly between the novel SV from TOPMed and other overlapping SV. C) FST plot of African versus European ancestry of SV across the entire genome, highlighting a threshold of 0.11.
Figure 4:
Figure 4:. Overview of the impact of the SVs and their clustering along the genome.
A) SVs identified and clustered based on their haploinsufficiency (HI) and triplosensitivity (TriS) potential across different allele counts. B) Overview of SV hotspots and deserts across the TOPMed cohort. Here deserts are regions of the genome with no SV identifiable despite the large collection of individuals in this study.

References

    1. Scott A. J., Chiang C. & Hall I. M. Structural variants are a major source of gene expression differences in humans and often affect multiple nearby genes. Genome Res. (2021) doi:10.1101/gr.275488.121. - DOI - PMC - PubMed
    1. Jakubosky D. et al. Properties of structural variants and short tandem repeats associated with gene expression and complex traits. Nat. Commun. 11, 2927 (2020). - PMC - PubMed
    1. Rozowsky J. et al. Multi-tissue integrative analysis of personal epigenomes. (2021) doi:10.1101/2021.04.26.441442. - DOI
    1. Mahmoud M. et al. Structural variant calling: the long and the short of it. Genome Biology vol. 20 Preprint at 10.1186/s13059-019-1828-7 (2019). - DOI - PMC - PubMed
    1. Sedlazeck F. J., Lee H., Darby C. A. & Schatz M. C. Piercing the dark matter: bioinformatics of long-range sequencing and mapping. Nat. Rev. Genet. 19, 329–346 (2018). - PubMed

Publication types

Grants and funding