Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May 2:16:1499456.
doi: 10.3389/fgene.2025.1499456. eCollection 2025.

Validation of a comprehensive long-read sequencing platform for broad clinical genetic diagnosis

Affiliations

Validation of a comprehensive long-read sequencing platform for broad clinical genetic diagnosis

Siddhartha Sen et al. Front Genet. .

Abstract

Though short read high-throughput sequencing, commonly known as Next-Generation Sequencing (NGS), has revolutionized genomics and genetic testing, there is no single genetic test that can accurately detect single nucleotide variants (SNVs), small insertions/deletions (indels), complex structural variants (SVs), repetitive genomic alterations, and variants in genes with highly homologous pseudogenes. The implementation of a unified comprehensive technique that can simultaneously detect a broad spectrum of genetic variation would substantially increase efficiency of the diagnostic process. The current study evaluated the clinical utility of long-read sequencing as a comprehensive genetic test for diagnosis of inherited conditions. Using Oxford Nanopore Technologies long read nanopore sequencing, we successfully developed and validated a clinically deployable integrated bioinformatics pipeline that utilizes a combination of eight publicly available variant callers. A concordance assessment comparing the known variant calls from a well-characterized, benchmarked sample called NA12878 from the National Institute of Standards and Technology (NIST) with the variants detected by our pipeline for this sample, determined that the analytical sensitivity of our pipeline was 98.87% and the analytical specificity exceeded 99.99%. We then evaluated our pipeline's ability to detect 167 clinically relevant variants from 72 clinical samples. This set of variants consisted of 80 SNVs, 26 indels, 32 SVs, and 29 repeat expansions, including 14 variants in genes with highly homologous pseudogenes. The overall detection concordance for these clinically relevant variants was 99.4% (95% CI: 99.7%-99.9%). Importantly, in addition to detecting known clinically relevant variants, in four cases, our pipeline yielded valuable additional information in support of clinical diagnoses that could not have been established using short-read NGS alone. Our findings suggest that long-read sequencing is successful in identifying diverse genomic alterations and that our pipeline functions well as the basis for a single diagnostic test for patients with suspected genetic disease.

Keywords: Oxford Nanopore Technologies; Tandem repeat expansions; clinical genomics; complex structural variants; long-read sequencing; whole genome sequencing.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
Filtering strategy employed in the detection of clinically relevant structural variants. Flow chart outlining the ONT pipeline’s SV call filtering strategy. A combination of breakpoint-based callers (NanoVar and DeBreak) as well as read-depth callers (QDNAseq and CNVpytor) was used to identify structural variants (SVs). Before any filters were applied, there was an average of approximately 40,000 hits for small SVs detected by NanoVar and approximately 14,000 by DeBreak. For the large SVs and CNVs, there was an average of 192 hits identified by QDNAseq and 1803 by CNVpytor, respectively. When filtered for autosomes and sex chromosomes, the average numbers of large SVs/CNVs were 56 for autosomes and 162 for sex chromosomes, respectively. The next filtering step involved restricting the SV calls to genes that have a human phenotype, which again diminished the total number of SVs. The last filtering step involved application of a set of logic rules resulting in a further reduction in the number of SVs. The SVs that remained at the end of all the filtering steps were subjected to a final review and classified.
FIGURE 2
FIGURE 2
Custom nanopore variant calling pipeline. The boxes depict file names and file types. The connecting lines between boxes are labeled with the pipeline components that require and generate respective input and output file types. The pipeline outputs multiple unique variant call files (VCFs) that will be combined for clinical analysis.
FIGURE 3
FIGURE 3
Nanopore sequencing metrics. Violin plots depicting the variability in (A) coverage, (B) average read quality, (C) average read length, and (D) N50 for the 72 clinical samples sequenced. For each plot, each dot represents one clinical sample. The dashed black horizontal lines represent the median value for each metric. The dashed gray line on the coverage plot shows the a priori target minimum coverage value (30x) for samples in this study.
FIGURE 4
FIGURE 4
Variants detected using ONT pipeline from 72 clinical samples. Variants assessed using ONT were grouped into four categories: single nucleotide variants (SNV), indels, structural variants (SV) and repeats. For each variant category, the number of variants accurately detected by our pipeline are shown in blue and the variant number is listed at the bottom of each bar. For each variant category, the number of variants that were not accurately detected by our pipeline are shown in red and the variant number is listed above each bar.
FIGURE 5
FIGURE 5
IGV images of indels used to assess functionality of variant callers for indels of different sizes. (A) shows the ABCD1 c.1635-16_1645delinsCACAGACATGTAGGGC variant, which results in a loss of 26 bases and a gain of 16 bases. This variant was accurately detected by the Clair3 component of the pipeline. The blue bars above the coverage data indicate the variants present in the Clair3 VCF. (B) shows the TRHR c.1137_1152delinsTTTTGTGGCAGGTGCTTGGCTGCCTGCCACAGGCAA variant, which results in a loss of 16 bases and a gain of 36 bases. This was the largest independent indel variant assessed in this study. This indel was not detected by Clair3 nor SV callers. (C) shows a complex recombinant allele at the 3′ end of GBA. The recombinant allele includes several SNVs as well as a 55-base deletion, representing a gene conversion to pseudogene (GBAP1) sequence. The black arrow below the GBA sequence at the bottom of the image indicates the approximate junction between reference GBA sequence and pseudogene sequence. The genomic region to the right (upstream) of the black arrow is GBA sequence. The region to the left (downstream) of the black arrow is the region of gene conversion. The two rows of blue boxes above the coverage data indicate the SNVs called by Clair3 (upper row) and the 55-base deletion called by NanoVar.
FIGURE 6
FIGURE 6
Tandem Genotypes waterfall plot depicting CAG expansion in ATXN1 (SCA1) for one proband. The x-axis represents genomic position beginning at the start of the CAG repeat tract in ATXN1. Genomic reads are stacked and the y-axis depicts read number. Orange regions represent CAG repeat sequence, blue regions represent CAT interruptions, and grey regions represent genetic sequence that is not CAT nor CAG. Half of the reads (upper region of the graph) show a wild type ATXN1 allele with two CAT interruptions. The remaining reads (lower region of the graph) show an expanded ATXN1 allele that does not have protective CAT interruptions and has expanded into the pathogenic range causative of spinocerebellar ataxia type 1 (SCA1).
FIGURE 7
FIGURE 7
Variants in genes with highly homologous pseudogenes. Fourteen of the 167 variants from clinical samples assessed in this study occurred in genes with highly homologous pseudogenes. All 14 of these variants were detected by our pipeline. The variant types are coded by color and the specific genes in which the variants occurred are listed to the right of each bar segment.
FIGURE 8
FIGURE 8
Fanconi anemia case resolved using long read sequencing. (A) IGV image showing exons 1–26 of FANCA in a sample from a patient with a clinical diagnosis of Fanconi anemia. By short read NGS, a single deletion call was made spanning exons 1–23. Nanopore sequencing identified two distinct FANCA deletions in trans (exons 1–11 and exons 12–23), with a 140 bp overlap (blue bars above coverage data). Both deletions were called accurately by Debreak. The red box depicts the genomic region magnified in Figure 7B. (B) IGV image showing the region surrounding the 140 bp overlap of both FANCA deletions.
FIGURE 9
FIGURE 9
Caroli syndrome case resolved using long read sequencing IGV image showing two heterozygous pathogenic PKD1 variants in cis. Reads are grouped by nucleotide at chr16:2164490 demonstrating that all the reads with one pathogenic variant share the other pathogenic variant.
FIGURE 10
FIGURE 10
Osteopetrosis case resolved using long read sequencing. IGV image showing a single heterozygous pathogenic nonsense TCIRG1 variant (red star above reads on left side of image) that was identified previously by an external laboratory. Long read sequencing identified a TCIRG1-disrupting Alu insertion (purple triangle above reads on right side of image) in trans, leading to a definitive molecular diagnosis. No spanning reads have both the nonsense variant and the Alu insertion, confirming the trans conformation of these variants.

References

    1. Aganezov S., Yan S. M., Soto D. C., Kirsche M., Zarate S., Avdeyev P., et al. (2022). A complete reference genome improves analysis of human genetic variation. Science 376, eabl3533. 10.1126/science.abl3533 - DOI - PMC - PubMed
    1. Ali H., Hussain N., Naim M., Zayed M., Al-Mulla F., Kehinde E. O., et al. (2015). A novel PKD1 variant demonstrates a disease-modifying role in trans with a truncating PKD1 mutation in patients with autosomal dominant polycystic kidney disease. BMC Nephrol. 16, 26. 10.1186/s12882-015-0015-7 - DOI - PMC - PubMed
    1. Chen X., Harting J., Farrow E., Thiffault I., Kasperaviciute D., Genomics England Research C., et al. (2023). Comprehensive SMN1 and SMN2 profiling for spinal muscular atrophy analysis using long-read PacBio HiFi sequencing. Am. J. Hum. Genet. 110, 240–250. 10.1016/j.ajhg.2023.01.001 - DOI - PMC - PubMed
    1. Chen Y., Wang A. Y., Barkley C. A., Zhang Y., Zhao X., Gao M., et al. (2023). Deciphering the exact breakpoints of structural variations using long sequencing reads with DeBreak. Nat. Commun. 14, 283. 10.1038/s41467-023-35996-1 - DOI - PMC - PubMed
    1. Cortese A., Simone R., Sullivan R., Vandrovcova J., Tariq H., Yau W. Y., et al. (2019). Biallelic expansion of an intronic repeat in RFC1 is a common cause of late-onset ataxia. Nat. Genet. 51, 649–658. 10.1038/s41588-019-0372-4 - DOI - PMC - PubMed

LinkOut - more resources