Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Feb 6;112(2):428-449.
doi: 10.1016/j.ajhg.2025.01.002. Epub 2025 Jan 24.

Advancing long-read nanopore genome assembly and accurate variant calling for rare disease detection

Affiliations

Advancing long-read nanopore genome assembly and accurate variant calling for rare disease detection

Shloka Negi et al. Am J Hum Genet. .

Abstract

More than 50% of families with suspected rare monogenic diseases remain unsolved after whole-genome analysis by short-read sequencing (SRS). Long-read sequencing (LRS) could help bridge this diagnostic gap by capturing variants inaccessible to SRS, facilitating long-range mapping and phasing and providing haplotype-resolved methylation profiling. To evaluate LRS's additional diagnostic yield, we sequenced a rare-disease cohort of 98 samples from 41 families, using nanopore sequencing, achieving per sample ∼36× average coverage and 32-kb read N50 from a single flow cell. Our Napu pipeline generated assemblies, phased variants, and methylation calls. LRS covered, on average, coding exons in ∼280 genes and ∼5 known Mendelian disease-associated genes that were not covered by SRS. In comparison to SRS, LRS detected additional rare, functionally annotated variants, including structural variants (SVs) and tandem repeats, and completely phased 87% of protein-coding genes. LRS detected additional de novo variants and could be used to distinguish postzygotic mosaic variants from prezygotic de novos. Diagnostic variants were established by LRS in 11 probands, with diverse underlying genetic causes including de novo and compound heterozygous variants, large-scale SVs, and epigenetic modifications. Our study demonstrates LRS's potential to enhance diagnostic yield for rare monogenic diseases, implying utility in future clinical genomics workflows.

Keywords: clinical testing; gene conversion; haplotype phasing; long-read sequencing; methylation; rare-disease diagnosis; structural variants; variant annotation.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests A.O.-L. was a paid consultant for Tome Biosciences, Ono Pharma USA, and Addition Therapeutics. The Rare Genomes Project received support in the form of reagents from Illumina Inc. and Pacific Biosciences.

Figures

Figure 1
Figure 1
Single-flow-cell scalable sequencing protocol (A) Cost-efficient, scalable, one-flow-cell nanopore sequencing protocol. (B) From top to bottom: read length N50, that is, the read length (y axis) such that reads of this length or longer represent 50% of the total sequence; total sequenced bases/haploid human genome coverage (assuming a 3.1-Gbp genome) from total reads for each sample; distribution of read identities (percentage of matching bases in reads when aligned to the reference genome) when aligned to T2T-CHM13 v.2.0. Supporting data are available in Table S1.
Figure 2
Figure 2
Genome completeness analysis reveals complex Mendelian disease-associated genes callable by long-read sequencing only (A) Genome-wide coverage distribution across all probands for GRCh38 and T2T-CHM13, calculated using reads with a mapping quality (MAPQ) greater than 10. Assembly gaps and simulated centromeric regions were excluded for GRCh38-aligned reads. Bases are categorized into four coverage levels: CALLABLE (10–80×), HIGH_COVERAGE (>80×), LOW_COVERAGE (0–10×), and NO_COVERAGE (0×). For proband RGP_696_3. (B) Ideogram for GRCh38 (left) and T2T-CHM13 (right), showing genomic regions >1 kb callable by LRS with no coverage in SRS (blue) and regions >1 kb callable by SRS with no coverage in LRS (magenta). Red cytoband represents the centromere. (C) Cumulative counts of genomic features (coding exons, Mendelian disease-associated genes, and all protein-coding genes) based on overlap fraction. The x axis shows the fraction of each feature’s length. The y axis shows the number of genomic features with LRS-only/SRS-only callable coverage over at least a fraction (x) of their length (y axis limit is set to 700 for clarity). (D) Ideogram highlighting seven Mendelian disease-associated genes that have most of their length callable by LRS only. Red cytoband represents the centromere. (B), (C), and (D) are shown for one proband, and supporting data for other probands are available in Table S2.
Figure 3
Figure 3
LRS detects additional rare functionally annotated small variants (A) Comparison of functionally annotated HIGH and MODERATE impact functionally annotated small variants (top, SNVs; bottom, indels) between SRS and LRS. (B) Linear breakdown of LRS-only FAVs reveals additional rare variants in Mendelian disease-associated genes. (C) Example from sample RGP_1081_3 showing a rare, heterozygous, MODERATE impact missense variant in KRT86, a gene associated with autosomal dominant Monilethrix, located in a region unmappable with short reads. This was not found to be clinically relevant in the proband.
Figure 4
Figure 4
De novo SNV comparison between SRS and LRS (A) Counts of rare DN-SNVs (genome wide and annotated) called exclusively by each technology (LRS or SRS). (B) Comparison between LRS and SRS DN-SNV callsets. Bar charts in the center represent concordance. Pie charts on each side (left for LRS-only and right for SRS-only) show the proportion of exclusive DN-SNVs that upon IGV inspection are found to be likely false positive, called by the other technology as non-DN-SNV, likely true positive, or likely postzygotic mosaic. (C) Allele balance of likely postzygotic mosaic DN-SNVs in SRS (left) and LRS (right) reads, mapped to both GRCh38 and T2T-CHM13. (D) Correlation of allele balance between SRS and LRS for potential mosaic DN-SNVs compared to the concordant set (DN-SNVs called by both technologies). Allele balance is consistent across GRCh38 and T2T-CHM13 mapped reads. (E) (Top) Likely true-positive LRS-only DN-SNVs in GIAB low short-read mappable regions. (Bottom) Likely true-positive LRS-only and SRS-only DN-SNVs stratified by overlap with GIAB low-complexity regions. Supporting data are available in Tables S7–S14.
Figure 5
Figure 5
Characterization of structural variants and tandem-repeat expansions with LRS (A) Counts of LRS and SRS SVs (deletions, insertions, and inversions) per proband. (B) Comparison of LRS and SRS SVs using a fuzzy-matching approach implemented by sveval (for supporting data see Table S18). (C) Number of rare structural variants (allele frequency of 0.01 or less) in each individual with different profile. The violin plots represent the distribution across probands, and the dots highlight the median values. The “high” and “modifier” impact prediction came from SnpEff (HIGH/MODIFIER impact classes). Regulatory regions are candidate cis-regulatory elements from ENCODE with enhancer-like signature. (D) Expansion scores at annotated simple repeat sites across all probands. The vertical dotted line highlights repeats that are significantly expanded (adjusted p value <0.01 and fold change >2) compared to the controls. Regions at less than 10 kbp of coding exons are highlighted in green (for known disease-associated genes) and orange (for other protein-coding genes).
Figure 6
Figure 6
Phasing with long reads reveals compound heterozygous variants in protein-coding genes (A) (Left) Plot showing cumulative counts of protein-coding genes overlapping a single phase block with varying overlap fractions. The x axis shows the fraction of each gene’s length. The y axis represents the number of genes that are at least x fraction phased by a single phase block. Each line corresponds to a sample and is colored by its phase block NG50. (Right) Plot showing the number of genes phased by a single phase block across different phasing percentage categories (0%, 0%–25%, 25%–50%, 50%–75%, 75%–100%, and 100%) on the x axis, with the y axis showing the count of genes per individual within each phasing category. (B) In proband DSDTRN17, LRS resolved pathogenic compound heterozygous variants in LHCGR, encoding the Luteinizing hormone/choriogonadotropin receptor, causing Leydig cell hypoplasia.
Figure 7
Figure 7
CSS1 diagnoses with concurrent detection of known episignature (A) De novo deep intronic VUS in ARID1B on short-read whole-genome sequencing data. (B) Sashimi plot shows that the variant leads to inclusion of a cryptic exon in the proband. (C) Using ONT whole-genome CpG methylation status of 106 sites known to be differentially methylated in CSS1, PMGRC-146-146-0 shows a CpG methylation pattern more similar to the CSS1 episignature than all the other samples in the cohort (p < 0.001, permutation test).

Update of

References

    1. Graessner H., Zurek B., Hoischen A., Beltran S. Solving the unsolved rare diseases in Europe. Eur. J. Hum. Genet. 2021;29:1319–1320. doi: 10.1038/s41431-021-00924-8. - DOI - PMC - PubMed
    1. Kingsmore S.F., Cakici J.A., Clark M.M., Gaughran M., Feddock M., Batalov S., Bainbridge M.N., Carroll J., Caylor S.A., Clarke C., et al. A Randomized, Controlled Trial of the Analytic and Diagnostic Performance of Singleton and Trio, Rapid Genome and Exome Sequencing in Ill Infants. Am. J. Hum. Genet. 2019;105:719–733. doi: 10.1016/j.ajhg.2019.08.009. - DOI - PMC - PubMed
    1. Costain G., Walker S., Marano M., Veenma D., Snell M., Curtis M., Luca S., Buera J., Arje D., Reuter M.S., et al. Genome Sequencing as a Diagnostic Test in Children With Unexplained Medical Complexity. JAMA Netw. Open. 2020;3 doi: 10.1001/jamanetworkopen.2020.18109. - DOI - PMC - PubMed
    1. Wojcik M.H., Reuter C.M., Marwaha S., Mahmoud M., Duyzend M.H., Barseghyan H., Yuan B., Boone P.M., Groopman E.E., Délot E.C., et al. Beyond the exome: What’s next in diagnostic testing for Mendelian conditions. Am. J. Hum. Genet. 2023;110:1229–1248. doi: 10.1016/j.ajhg.2023.06.009. - DOI - PMC - PubMed
    1. Nurk S., Koren S., Rhie A., Rautiainen M., Bzikadze A.V., Mikheenko A., Vollger M.R., Altemose N., Uralsky L., Gershman A., et al. The complete sequence of a human genome. Science. 2022;376:44–53. doi: 10.1126/science.abj6987. - DOI - PMC - PubMed

MeSH terms

LinkOut - more resources