Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Feb 17;9(1):11.
doi: 10.1038/s41525-024-00394-z.

Assessing the efficacy of target adaptive sampling long-read sequencing through hereditary cancer patient genomes

Collaborators, Affiliations

Assessing the efficacy of target adaptive sampling long-read sequencing through hereditary cancer patient genomes

Wataru Nakamura et al. NPJ Genom Med. .

Abstract

Innovations in sequencing technology have led to the discovery of novel mutations that cause inherited diseases. However, many patients with suspected genetic diseases remain undiagnosed. Long-read sequencing technologies are expected to significantly improve the diagnostic rate by overcoming the limitations of short-read sequencing. In addition, Oxford Nanopore Technologies (ONT) offers adaptive sampling and computationally driven target enrichment technology. This enables more affordable intensive analysis of target gene regions compared to standard non-selective long-read sequencing. In this study, we developed an efficient computational workflow for target adaptive sampling long-read sequencing (TAS-LRS) and evaluated it through application to 33 genomes collected from suspected hereditary cancer patients. Our workflow can identify single nucleotide variants with nearly the same accuracy as the short-read platform and elucidate complex forms of structural variations. We also newly identified several SINE-R/VNTR/Alu (SVA) elements affecting the APC gene in two patients with familial adenomatous polyposis, as well as their sites of origin. In addition, we demonstrated that off-target reads from adaptive sampling, which is typically discarded, can be effectively used to accurately genotype common single-nucleotide polymorphisms (SNPs) across the entire genome, enabling the calculation of a polygenic risk score. Furthermore, we identified allele-specific MLH1 promoter hypermethylation in a Lynch syndrome patient. In summary, our workflow with TAS-LRS can simultaneously capture monogenic risk variants including complex structural variations, polygenic background as well as epigenetic alterations, and will be an efficient platform for genetic disease research and diagnosis.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Overview of the workflow for adaptive sampling using nanopore sequencing.
The FAST5 files were base-called using highaccuracy model in Guppy, followed by alignment of all FASTQ files using minimap2. De novo detection of SNVs/Indels were performed through PEPPER-Margin-DeepVariant, while SVs were identified using nanomonsv. Common SNP genotypes were called using GLIMPSE. For downstream analysis, the polygenic risk score was calculated with PLINK using genotyping results obtained from GLIMPSE. In allele-specific methylation analysis, each read in the BAM file was assigned to its respective haplotype using WhatsHap, based on the genotyping results obtained from GLIMPSE. The methylation calling for each haplotype was performed using f5c, and the identification of aberrantly methylated genes was performed using an in-house script.
Fig. 2
Fig. 2. Quality check summary of TAS-LRS sequencing data.
a Sequence coverage of on-target and off-target regions and the concentration ratio (ratio of on-target to off-target coverage) for each sample. Samples were ordered by on-target coverage. b N50 statistics calculated for on-target and off-target regions for each sample. Samples were ordered by on-target N50 values.
Fig. 3
Fig. 3. Summary of SNVs/Indels detected in the target region.
a, b The number of SNVs (a) and Indels (b) for each sample stratified by whether they were detected by TAS-LRS, WG-SRS, or both. c Venn diagram showing the categories of putative pathogenic variants identified (known pathogenic, loss-of-function, and splicing variants). See also Supplementary Fig. 3.
Fig. 4
Fig. 4. Schematic representation of structural variations of the RB1 gene in two patients with Retinoblastoma.
a A balanced translocation involving RB1 detected in S3 consists of two interchromosomal junctions. One junction connects breakpoint 1 (in the 2nd intron of the RB1 gene) and breakpoint 4 (in the 6th intron of the LRMDA gene), and the other junction juxtaposes breakpoint 3 (in the 17th intron of the RB1 gene) and breakpoint 4 (in the 6th intron of the LRMDA gene). Approximately 54 kbp region between breakpoint 1 and breakpoint 2 in the RB1 gene was deleted. b A deletion spanning a 44 kbp region spanning from the 20th intron of the RB1 gene to the 10th intron of the RCBTB2 gene.
Fig. 5
Fig. 5. Details of SVA-derived insertion into the intronic region of the APC gene in two patients with familial adenomatous polyposis.
a The IGV displayed long-read sequencing data and transcript sequencing data showing an SVA-derived insertion of 2731 bp in the 9th intron of the APC gene. b The whole transcriptome sequence showed specific intron retention at the near exon–intron boundary. c An SVA inserted into the 9th intron of the APC gene in patient S5, derived from two concatenated human-specific subfamily SVA_F elements located at 6a22.31, which undergoes 5´ truncation and poly(A) tail addition prior to insertion. d An SVA inserted into the 8th intron of the APC gene in patient S36, derived from concatenated SVA_D and SVA_E elements located at 12p13.31, which undergoes poly(A) tail addition prior to insertion.
Fig. 6
Fig. 6. Comparison of genome-wide common SNP genotyping by TAS-LRS (imputation of low-coverage off-target sequencing data using GLIMPSE) compared to WG-SRS (direct variant calling on high-coverage whole-genome sequencing data by GATK).
a Imputation accuracy of TAS-LRS was measured on chromosome 1 for each minor allele frequency range. Genotyping by WG-SRS was used as the golden standard. See also Supplementary Fig. 9. Box plots show medians (lines), interquartile ranges (IQRs; boxes), ±1.5 × IQRs (whiskers), and outliers (dots). b PCA of genotype results from both TAS-LRS and WG-SRS for each individual (distinguished by color). Pairs of the same individuals are clearly clustered, indicating that the batch effect of the difference between the TAS-LRS and WG-SRS platforms has effectively disappeared. One outlier sample that could have originated from different ancestries was excluded. See also Supplementary Fig. 10. c Comparison of PRSs for three cancers calculated from the genotype by TAS-LRS (X-axis) and WG-SRS (Y-axis). Each point indicates each sample and each color indicates each syndrome name (red: Familial adenomatous polyposis, blue: Familial pancreatic cancer, green: Hepatic angiomyolipoma, purple: Hereditary breast and ovarian cancer, orange: Li–Fraumeni syndrome, yellow: Lynch syndrome, brown: multiple endocrine neoplasia type 1, pink: multiple endocrine neoplasia type 2, gray: PTEN hamartoma tumor syndrome, black: Retinoblastoma).
Fig. 7
Fig. 7. A case of an MLH1 epimutation in a patient with LS.
a Alignment view of around the promoter region of the MLH1 gene. Each read was classified as haplotype 1 or 2 using Whatshap software. The CpG sites of each read are colored red if methylated and blue if not. It can be clearly seen that methylation is increased specifically for haplotype 2. b Immunohistochemical staining for DNA mismatch repair protein performed on cancer tissue from patient S33. Loss of immunohistochemical expression of MLH1/PMS2 was observed. Scale Bar = 100 μm.

Similar articles

Cited by

References

    1. 100,000 Genomes Project Pilot Investigators. et al. 100,000 Genomes pilot on rare-disease diagnosis in health care—preliminary report. N. Engl. J. Med. 2021;385:1868–1880. doi: 10.1056/NEJMoa2035790. - DOI - PMC - PubMed
    1. Wang Y, Zhao Y, Bollas A, Wang Y, Au KF. Nanopore sequencing technology, bioinformatics and applications. Nat. Biotechnol. 2021;39:1348–1365. doi: 10.1038/s41587-021-01108-x. - DOI - PMC - PubMed
    1. Beyter D, et al. Long-read sequencing of 3,622 Icelanders provides insight into the role of structural variants in human diseases and other traits. Nat. Genet. 2021;53:779–786. doi: 10.1038/s41588-021-00865-4. - DOI - PubMed
    1. Jiang T, et al. Long-read-based human genomic structural variation detection with cuteSV. Genome Biol. 2020;21:189. doi: 10.1186/s13059-020-02107-y. - DOI - PMC - PubMed
    1. Miller DE, et al. Targeted long-read sequencing identifies missing disease-causing variation. Am. J. Hum. Genet. 2021;108:1436–1449. doi: 10.1016/j.ajhg.2021.06.006. - DOI - PMC - PubMed