Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Oct;42(10):1571-1580.
doi: 10.1038/s41587-023-02024-y. Epub 2024 Jan 2.

Detection of mosaic and population-level structural variants with Sniffles2

Affiliations

Detection of mosaic and population-level structural variants with Sniffles2

Moritz Smolka et al. Nat Biotechnol. 2024 Oct.

Erratum in

Abstract

Calling structural variations (SVs) is technically challenging, but using long reads remains the most accurate way to identify complex genomic alterations. Here we present Sniffles2, which improves over current methods by implementing a repeat aware clustering coupled with a fast consensus sequence and coverage-adaptive filtering. Sniffles2 is 11.8 times faster and 29% more accurate than state-of-the-art SV callers across different coverages (5-50×), sequencing technologies (ONT and HiFi) and SV types. Furthermore, Sniffles2 solves the problem of family-level to population-level SV calling to produce fully genotyped VCF files. Across 11 probands, we accurately identified causative SVs around MECP2, including highly complex alleles with three overlapping SVs. Sniffles2 also enables the detection of mosaic SVs in bulk long-read data. As a result, we identified multiple mosaic SVs in brain tissue from a patient with multiple system atrophy. The identified SV showed a remarkable diversity within the cingulate cortex, impacting both genes involved in neuron function and repetitive elements.

PubMed Disclaimer

Conflict of interest statement

F.J.S. receives research support from PacBio, Genetech and Oxford Nanopore Technologies. S.W.S. is a member of the Scientific Advisory Council of the Lewy Body Dementia Association and the Multiple System Atrophy Coalition. S.W.S. is an editorial board member of JAMA Neurology and the Journal of Parkinson’s Disease. L.F.P. is sponsored by Genentech, Inc. K.H. is an employee of Bionano Genomics. D.P. provides consulting service for Ionis Pharmaceuticals. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Overview of Sniffles2.
a, For Sniffles2, we implemented a repeat aware clustering coupled with a fast consensus sequence and coverage-adaptive filtering to improve accuracy of the germline SV calls. b, One key limitation of current SV calling is the generation of fully genotyped population VCF. Sniffles2 implements a concept similar to a gVCF file where single-sample calling is done only once, which reduces runtime multiple-fold. c, Mosaic SV detection is enabled by improved detection and filtering of low VAF SVs (by default, 5–20%) across a bulk sample. This is enabled over additional noise detection methodology as well as refinement and filtering approaches that we developed.
Fig. 2
Fig. 2. Performance assessment of Sniffles2 based on GIAB.
Performance metrics for correctly identifying and genotyping SVs across ONT (left) and PacBio HiFi (right). All details are presented in Supplementary Table 1. For a,b,e,f, the shaded symbols mean that the Genotype F1 score was lower than 0.5. a,b, Comparison across Tier1 GIAB genome-wide SV (Genotype F1 score on the y axis; higher is better) across different coverages (symbols) and SV caller (x axis) for default and maximum sensitivity parameters (blue and red (Tuned), respectively). c,d, Runtime comparison across Tier1 GIAB genome-wide SV (CPU minutes on the y axis; lower is better) across different coverages (symbols) and SV caller (x axis) for default and maximum sensitivity parameters (blue and red (Tuned), respectively). e,f, Comparison across GIAB challenging medical gene (CMRG) benchmark for SV (Genotype F1 score on the y axis; higher is better) across different coverages (symbols) and SV caller (x axis) for default and maximum sensitivity parameters. g,h, Runtime comparison across GIAB CRMG benchmark for SV (CPU minutes on the y axis; lower is better) across different coverages (symbols) and SV caller (x axis) for default and maximum sensitivity parameters.
Fig. 3
Fig. 3. Sniffles2 population approach and application to Mendelian disease.
a, Comparison of the proportion of consistent, inconsistent and uninformative (NA) genotypes across HG002/3/4 for Sniffles2 population merge and cuteSV. cuteSV with genotyping takes more than 6.24× the time. bd, Three examples of SVs detected by Sniffles2 in Mendelian disorders in probands. Chromosomal position is shown in the top part (Xq28), followed by the arrows that represent a specific loci. Next is shown aCGH data dots that represent genomic positions being assayed. Black dots represent a log2 ratio between −0.35 and 0.35; red dots represent a log ratio above 0.35; and green dots represent a ratio below −0.35. Consistent (at least three consecutive probes) log2 ratios above 0.35 represent a region of copy number gain and below −0.35 represent copy number loss. In orange, we show SegDups, and, in teal, we show the SV called by Sniffles2. IGV screenshot and fully resolved events are shown in the lower part of each example. b, Tandem duplication that was fully resolved by Sniffles2 in one of the patients (BH14233_1). Sniffles2 was able to identify and map the junction of the duplication within a segmental duplication region where array data does not provide information. c, Detailed aCGH view of a complex duplication-normal-duplication (DUP-NML-DUP) structure in sample BH13947_1 with breakpoints within SegDup or LCR region (orange bar) where Sniffles2 is indicating two overlapping inversions in IGV (teal bars) forming Jct1 and Jct2. Bottom arrows indicate the possible DUP-NML-INV/DUP haplotype structure containing Jct1 and Jct2. d, Sample BH15700_1 shows a complex duplication-triplication-duplication structure as highlighted in aCGH data with SegDups and LCRs highlighted (orange bars). Sniffles2 identifies the inversion breakpoint at Jct2 (teal bar) but cannot fully resolve the entire allele including Jct1 as it is also not possible to be reported in the VCF standard. Red arrows indicate duplicated regions, and blue arrows show triplicated portions. One possible haplotype structure for a DUP-TRP/INV-DUP is shown with the triplication and initial duplication being inverted, forming Jct1 and Jct2 (ref. ).
Fig. 4
Fig. 4. Recovery of somatic SVs using the Sniffles2 mosaic mode.
a,b, Benchmark of mixtures of HG002 with HG00733. We spiked HG002 in various concentrations and measured the precision (a) and recall (b) of Sniffles2 default (blue) and mosaic (yellow) modes, alongside cuteSV (in red). For the recall, we added an adjusted recall (in green) as Sniffles2 mosaic mode calls SVs only in the range of 0.05 to 0.20 VAF, and, thus, everything outside that range will not be analyzed. c, Overview of the number of SV types identified as germline (blue) and mosaic (red) in the cingulate cortex brain region of an MSA patient brain sample sequenced with 55× ONT long reads. A zoom is shown for duplication and inversion SVs. d,e, Validated mosaic SVs detected by Sniffles2. Each PCR was done once (d)—mosaic deletion close to a germline Alu insertion. The IGV screenshot shows bulk WGS: top panel 55× ONT, bottom panel 85× Illumina. PCR validation shows both products from the MSA brain (column b, insertion in top and deletion in bottom) compared to a control (column c) and the ladder (column a). The PCR products highlighted in squares were Sanger sequenced, and the alignment is shown below the gel (colors matching), with the INS position marked with a purple triangle. e, Mosaic deletion within RBFOX3. The IGV screenshot shows bulk WGS: top panel 55× ONT, bottom panel 85× Illumina. PCR demonstrates the mosaic deletion (column b, wild-type in top and deletion in bottom) compared to two controls (column c, brain control) and the ladder (column a). The PCR products highlighted in squares were Sanger sequenced, and the alignment is shown below the gel (colors matching). Supplementary Fig. 6 shows the complete unannotated gels, and Supplementary Fig. 7 shows a different view of the same Illumina results for e. Supplementary Table 14 shows the complete list of candidate SVs, and Supplementary Fig. 8a–h shows all IGV screenshots for the same candidates.
Fig. 5
Fig. 5. Insights into somatic SVs in the MSA patient brain sample.
a, Overall comparison of SVs detected in ONT (Sniffles2), Illumina (Manta) and OGM datasets. b, Distribution of allele frequencies for SVs identified by Sniffles2 and Manta. c, Association of Sniffles2 germline and mosaic SVs with repeat elements. d, Tumor/normal comparison of the COLO829 cell line using two different sequencing technologies: ONT MinION and PacBio Revio. Highlighted are the tumor-specific SVs (in red), the normal/control-specific SVs (in green) and the technology-specific SVs (dashed lines). In the cancer-specific SV, we found variants overlapping with cancer-related genes, such as PTEN, PMS2, ARHGEF5, PAK2 and WWOX. Differences between ONT and Revio calls for the same cell line can be attributed to either technology differences or the evolution of the cell line through time. e, Example of a cancer-specific somatic SV that affects the PTEN gene. Both the PacBio and ONT datasets showed the same coordinates for the variant, and no read support was found in the control.
Extended Data Fig. 1
Extended Data Fig. 1. Performance of Sniffles2 population merge.
Here we show the total time used by each approach, which includes the SV calling for each member of a family trio, merging SV into a single VCF file, and in the case of cuteSV force call, re-genotyping each sample followed by a second merge. See Supplementary Table 10 for details.
Extended Data Fig. 2
Extended Data Fig. 2. Detailed view of inversion spanning nearly the entire X chromosome called by Sniffles2.
Detailed view of inversion spanning nearly the entire X chromosome (∼155 Mb) called by Sniffles2. This event is in fact not an inversion but a recombinant chromosome. This chromosomal aberration is generated de novo as the result of meiotic recombination in a mother carrying a heterozygous pericentric inversion. aCGH data shows a short-arm deletion (A, green arrow) and a long-arm duplication (C, red arrow). Sniffles2 is able to positionally connect the beginning of the duplication to the end of the deletion forming Jct1.
Extended Data Fig. 3
Extended Data Fig. 3
IGV alignments for a Structural Variant that was called by Sniffles2 but not represented in either the Bionano or Illumina call sets. The top shows the read alignments for the cingulate cortex ONT data, followed below by Illumina read alignments for cingulate cortex and cingulate white matter. A 687 bp mosaic duplication on chromosome 1, overlapping with simple repeats. Manual curation revealed DEL being called in the Illumina data in both cingulate cortex and white matter and no call in the Bionano data set.
Extended Data Fig. 4
Extended Data Fig. 4
IGV alignments for a Structural Variant that was called by Sniffles2 but not represented in either the Bionano or Illumina call sets. The top shows the read alignments for the cingulate cortex ONT data, followed below by Illumina read alignments for cingulate cortex and cingulate white matter. A 645 bp non-mosaic duplication in chromosome 10 that is flanked by a SINE and LINE element. Manual curation revealed no overlap in the Illumina nor the Bionano data.

References

    1. Mahmoud, M. et al. Structural variant calling: the long and the short of it. Genome Biol.20, 246 (2019). - DOI - PMC - PubMed
    1. Ho, S. S., Urban, A. E. & Mills, R. E. Structural variation in the sequencing era. Nat. Rev. Genet.21, 171–189 (2020). - DOI - PMC - PubMed
    1. Weissensteiner, M. H. et al. Discovery and population genomics of structural variation in a songbird genus. Nat. Commun.11, 3403 (2020). - PMC - PubMed
    1. Alonge, M. et al. Major impacts of widespread structural variation on gene expression and crop improvement in tomato. Cell182, 145–161 (2020). - DOI - PMC - PubMed
    1. Soyk, S. et al. Duplication of a domestication locus neutralized a cryptic variant that caused a breeding barrier in tomato. Nat. Plants5, 471–479 (2019). - DOI - PubMed

LinkOut - more resources