Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar 29;41(4):btaf136.
doi: 10.1093/bioinformatics/btaf136.

Sawfish: improving long-read structural variant discovery and genotyping with local haplotype modeling

Affiliations

Sawfish: improving long-read structural variant discovery and genotyping with local haplotype modeling

Christopher T Saunders et al. Bioinformatics. .

Abstract

Motivation: Structural variants (SVs) play an important role in evolutionary and functional genomics but are challenging to characterize. High-accuracy, long-read sequencing can substantially improve SV characterization when coupled with effective calling methods. While state-of-the-art long-read SV callers are highly accurate, further improvements are achievable by systematically modeling local haplotypes during SV discovery and genotyping.

Results: We describe sawfish, an SV caller for mapped high-quality long reads incorporating systematic SV haplotype modeling to improve accuracy and resolution. Assessment against the draft Genome in a Bottle (GIAB) SV benchmark from the T2T-HG002-Q100 diploid assembly shows that sawfish has the highest accuracy among state-of-the-art long-read SV callers across every tested SV size group. Additionally, sawfish maintains the highest accuracy at every tested depth level from 10- to 32-fold coverage, such that other callers required at least 30-fold coverage to match sawfish accuracy at 15-fold coverage. Sawfish also shows the highest accuracy in the GIAB challenging medically relevant genes benchmark, demonstrating improvements in both comprehensive and medically relevant contexts.When joint-genotyping seven samples from CEPH-1463, sawfish has over 9000 more pedigree-concordant calls than other state-of-the-art SV callers, with the highest proportion of concordant SVs (81%). Sawfish's quality model enables selection for an even higher proportion of concordant SVs (88%), while still calling nearly 5000 more pedigree-concordant SVs than other callers. These results demonstrate that sawfish improves on the state-of-the-art for long-read SV calling accuracy across both individual and joint-sample analyses.

Availability and implementation: Sawfish source code, pre-compiled Linux binaries, and documentation are released on GitHub: https://github.com/PacificBiosciences/sawfish.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
SV caller accuracy assessment. (a) Assessment of SV caller performance on the HG002 GIAB draft T2T assembly-based benchmark for HiFi WGS data from HG002 at 32-fold coverage. Results are stratified by SV size, showing consistently improved F1-score for sawfish across a range of SV sizes. (b) HG002 SV caller performance assessed as in part (a) but for all SV sizes with depth levels subsampled down to 10-fold coverage, showing improved F1-score for sawfish at all coverage levels. (c) Assessment of SV caller joint-genotyping on samples from CEPH Pedigree 1463 generations 2 and 3. For each SV caller, the number of SVs where genotypes are concordant with the pedigree inheritance pattern across all samples are shown compared to the percentage of concordant SVs. “sawfish HQ” shows sawfish results filtered for genotype quality (GQ) ≥40 in all samples. The results show that sawfish calls thousands more concordant SVs than other SV callers while at the same time calling proportionally more concordant SVs, where the proportion of concordant SVs can be increased to over 87% with modest quality filtration that maintains a very high concordant SV count.

Similar articles

Cited by

  • Human de novo mutation rates from a four-generation pedigree reference.
    Porubsky D, Dashnow H, Sasani TA, Logsdon GA, Hallast P, Noyes MD, Kronenberg ZN, Mokveld T, Koundinya N, Nolan C, Steely CJ, Guarracino A, Dolzhenko E, Harvey WT, Rowell WJ, Grigorev K, Nicholas TJ, Goldberg ME, Oshima KK, Lin J, Ebert P, Watkins WS, Leung TY, Hanlon VCT, McGee S, Pedersen BS, Happ HC, Jeong H, Munson KM, Hoekzema K, Chan DD, Wang Y, Knuth J, Garcia GH, Fanslow C, Lambert C, Lee C, Smith JD, Levy S, Mason CE, Garrison E, Lansdorp PM, Neklason DW, Jorde LB, Quinlan AR, Eberle MA, Eichler EE. Porubsky D, et al. Nature. 2025 Jul;643(8071):427-436. doi: 10.1038/s41586-025-08922-2. Epub 2025 Apr 23. Nature. 2025. PMID: 40269156 Free PMC article.
  • The Platinum Pedigree: a long-read benchmark for genetic variants.
    Kronenberg Z, Nolan C, Porubsky D, Mokveld T, Rowell WJ, Lee S, Dolzhenko E, Chang PC, Holt JM, Saunders CT, Olson ND, Steely CJ, McGee S, Guarracino A, Koundinya N, Harvey WT, Watkins WS, Munson KM, Hoekzema K, Chua KP, Chen X, Fanslow C, Lambert C, Dashnow H, Garrison E, Smith JD, Lansdorp PM, Zook JM, Carroll A, Jorde LB, Neklason DW, Quinlan AR, Eichler EE, Eberle MA. Kronenberg Z, et al. Nat Methods. 2025 Aug;22(8):1669-1676. doi: 10.1038/s41592-025-02750-y. Epub 2025 Aug 4. Nat Methods. 2025. PMID: 40759746
  • A Murine Database of Structural Variants Enables the Genetic Architecture of a Spontaneous Murine Lymphoma to be Characterized.
    Ren W, Fang Z, Dolzhenko E, Saunders CT, Cheng Z, Popic V, Peltz G. Ren W, et al. bioRxiv [Preprint]. 2025 May 2:2025.01.09.632219. doi: 10.1101/2025.01.09.632219. bioRxiv. 2025. PMID: 39868308 Free PMC article. Preprint.
  • A Hitchhiker's Guide to long-read genomic analysis.
    Mahmoud M, Agustinho DP, Sedlazeck FJ. Mahmoud M, et al. Genome Res. 2025 Apr 14;35(4):545-558. doi: 10.1101/gr.279975.124. Genome Res. 2025. PMID: 40228901 Review.

References

    1. Chaisson MJP, Sanders AD, Zhao X et al. Multi-platform discovery of haplotype-resolved structural variation in human genomes. Nat Commun 2019;10:1784. - PMC - PubMed
    1. Cheng H, Concepcion GT, Feng X et al. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods 2021;18:170–5. - PMC - PubMed
    1. Ebert P, Audano PA, Zhu Q et al. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science 2021;372:eabf7117. - PMC - PubMed
    1. English AC, Menon VK, Gibbs RA et al. Truvari: refined structural variant comparison preserves allelic diversity. Genome Biol 2022;23:271. - PMC - PubMed
    1. Kronenberg Z, Nolan C, Porubsky D et al. The platinum pedigree: a long-read benchmark for genetic variants. bioRxiv, 2024.10.02.616333, 2024, preprint: not peer reviewed.