Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Sep 13;15(1):8007.
doi: 10.1038/s41467-024-52027-9.

Impact and characterization of serial structural variations across humans and great apes

Collaborators, Affiliations

Impact and characterization of serial structural variations across humans and great apes

Wolfram Höps et al. Nat Commun. .

Abstract

Modern sequencing technology enables the systematic detection of complex structural variation (SV) across genomes. However, extensive DNA rearrangements arising through a series of mutations, a phenomenon we refer to as serial SV (sSV), remain underexplored, posing a challenge for SV discovery. Here, we present NAHRwhals ( https://github.com/WHops/NAHRwhals ), a method to infer repeat-mediated series of SVs in long-read genomic assemblies. Applying NAHRwhals to haplotype-resolved human genomes from 28 individuals reveals 37 sSV loci of various length and complexity. These sSVs explain otherwise cryptic variation in medically relevant regions such as the TPSAB1 gene, 8p23.1, 22q11 and Sotos syndrome regions. Comparisons with great ape assemblies indicate that most human sSVs formed recently, after the human-ape split, and involved non-repeat-mediated processes in addition to non-allelic homologous recombination. NAHRwhals reliably discovers and characterizes sSVs at scale and independent of species, uncovering their genomic abundance and suggesting broader implications for disease.

PubMed Disclaimer

Conflict of interest statement

J.O.K. has previously disclosed a patent application (no. EP19169090) relevant to Strand-seq. F.J.S receives research support from Illumina, PacBio and ONT. The other authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Overview: the NAHRwhals sSV detection method.
A Schematic representation of a sequence pair illustrating the principle of serial SVs (sSVs). Traditional SV calling is often limited to detangling only simple SV (Del vs. Inv) or to report the entire allele (Del-Inv-Del; middle). Instead, NAHRwhals infers a series of simple SV which can explain a given structural haplotype outcome (Inv-Del; right). B Flowchart showing the key steps of the NAHRwhals algorithm. Starting from reference and alternative assemblies and a reference region of interest, the homologous region is first extracted from the assembly. Pairwise alignments between Ref and Alt are generated and transformed into a segmented representation. Using this segmented dotplot, an exhaustive search is invoked to explore possible series of NAHR-mediated rearrangements explaining the structural differences. C An example mutation search tree of depth 3 for a simple segmented dotplot. Successful serial SVs are highlighted in red. D Results of sSV calling on simulated runs. SD length and similarity correlate with prediction accuracy as longer/more similar sequences are more likely to be retained both in the initial alignment and the dotplot segmentation. E Two examples of segmentation and mutation calling in real loci. Blue: positive alignments, Red: reverse complement alignments. Shade: segment length. Left: dotplot of pairwise alignments between hg38 (x) and assemblies (y). Middle: segmented dotplot representation. Right: segmented dotplot after application of the highest-scoring series of SVs.
Fig. 2
Fig. 2. Inversion regions identified as sSVs.
A Broad classification of the 336 loci initially surveyed with NAHRwhals. Loci were considered as containing sSVs if they displayed at least one overlapping pair of SVs in at least one sample. B Overview over the full callset of 37 inversion-containing loci in which sSVs were discovered in at least one sample. The diagram shows the prediction performance in humans and apes (‘SVs resolved’), the presence of recurrent inversions, core duplicon-mapping genes and morbid CNV regions in the genomic region, as well as genotypes for each locus. C A beeswarm plot showing whether predicted intermediate SVs (e.g., ‘inv’ for an ‘inv+del’) have been found, as a function of the frequency of the sSV. D Three distinct sequence configurations observed in the 1p11.2-1p12 sSV. 15/38 samples harbor a deletion preceded by an inversion compared to CHM13-T2T. E Dotplots and SD schematics illustrating examples of all three configurations. F A 185 kbp region on 11p15.4 showing complex patterns of nested SVs leading to extreme diversity in the region explicable by NAHR. G Dotplot and SD schematics of a highly rearranged (left) and a reference-like (right) alternative configuration. H sSVs found in the disease-relevant TPSAB1-containing region. Observed CNVs can be explained as simple SVs on the CHM13-T2T-like configuration, but appear as sSVs with respect to hg38 (Fig. S25).
Fig. 3
Fig. 3. Human sSV loci in great ape genome assemblies.
A Tabular view of NAHRwhals-based SV genotypes across 37 human sSV sites in four ape species. In 16 loci, the observed variation could be explained for at least one ape haplotype. B Dotplot views of a ca. 12 Mbp region on 16p, comparing the assemblies of CHM13-T2T, Bonobo and Orangutan (all single-contig; Chimpanzee and Gorilla: no contiguous asm). Two human sSV loci are highlighted in green. C Variation observed in the TCAF1/2 containing region on 7q35. Gorilla and Orangutan likely display a mixture of NAHR and non-NAHR SVs. D View of the CTAG1A/CTAG1B containing locus. Compared to Orangutan the CHM13-T2T-, Bonobo and Chimpanzee sequences harbor a duplication, which cannot be explained by NAHR.
Fig. 4
Fig. 4. sSVs in disease-relevant regions.
A Dotplot view of the 22q11 Dup/Del region, with two SD-rich blocks of sSV activity highlighted above. B Nested SVs lead to duplication or deletion of segmental duplications bordering the 22q11 Dup/Del region; potentially affecting the risk of subsequent CNV formation. C Inversion of one breakpoint of the SOTOS deletion creates a long pair of directly oriented SVs likely predisposing to subsequent CNV formation. D Two sSVs map to two ends of the 15q25.2 deletion region, making both breakpoints susceptible to individual rearrangements. An inversion in one sample leads to transfer of SD sequence to a third sSV block. E Variation observed in the highly recurrent 8p23.1 Inv/Del/Dup region. Among the three resolved instances of this locus, NAHR-based variation was observed in both flanking SD blocks.

References

    1. Alonge, M. et al. Major impacts of widespread structural variation on gene expression and crop improvement in tomato. Cell182, 145–161.e23 (2020). 10.1016/j.cell.2020.05.021 - DOI - PMC - PubMed
    1. Du, H. et al. The multiple de novo copy number variant (MdnCNV) phenomenon presents with peri-zygotic DNA mutational signatures and multilocus pathogenic variation. Genome Med14, 122 (2022). 10.1186/s13073-022-01123-w - DOI - PMC - PubMed
    1. Beck, C. R. et al. Megabase length hypermutation accompanies human structural variation at 17p11.2. Cell176, 1310–1324.e10 (2019). 10.1016/j.cell.2019.01.045 - DOI - PMC - PubMed
    1. Sekar, S. et al. Complex mosaic structural variations in human fetal brains. Genome Res30, 1695–1704 (2020). 10.1101/gr.262667.120 - DOI - PMC - PubMed
    1. Beyter, D. et al. Long-read sequencing of 3,622 Icelanders provides insight into the role of structural variants in human diseases and other traits. Nat. Genet.53, 779–786 (2021). 10.1038/s41588-021-00865-4 - DOI - PubMed

Publication types

LinkOut - more resources