Deciphering the largest disease-associated transcript isoforms in the human neural retina with advanced long-read sequencing approaches

Merel Stemerdink^{1

2}, Tabea Riepe^{3

4}, Nick Zomer⁴, Renee Salz³, Michael Kwint⁴, Jaap Oostrik¹, Raoul Timmermans⁴, Barbara Ferrari⁵, Stefano Ferrari⁵, Alfredo Dueñas Rey^{6

7}, Emma Delanote^{6

7}, Suzanne E de Bruijn^{1

4}, Hannie Kremer^{1

2

4}, Susanne Roosing⁴, Frauke Coppieters^{6

7

8}, Alexander Hoischen^{4

9}, Frans P M Cremers⁴, Peter A C 't Hoen³, Erwin van Wijk^#¹, Erik de Vrieze^#¹⁰

Affiliations

¹ Department of Otorhinolaryngology, Radboud University Medical Center, Nijmegen 6525 GA, The Netherlands.
² Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen 6525 GA, The Netherlands.
³ Department of Medical BioSciences, Radboud University Medical Center, Nijmegen 6525 GA, The Netherlands.
⁴ Department of Human Genetics, Radboud University Medical Center, Nijmegen 6525 GA, The Netherlands.
⁵ Fondazione Banca degli Occhi del Veneto, Zelarino, Venice 30174, Italy.
⁶ Center for Medical Genetics, Ghent University Hospital, Ghent 9000, Belgium.
⁷ Department of Biomolecular Medicine, Ghent University, Ghent 9000, Belgium.
⁸ Department of Pharmaceutics, Ghent University, Ghent 9000, Belgium.
⁹ Department of Internal Medicine and Radboud Center for Infectious Diseases (RCI), Radboud University Medical Center, Nijmegen 6525 GA, The Netherlands.
¹⁰ Department of Otorhinolaryngology, Radboud University Medical Center, Nijmegen 6525 GA, The Netherlands; erik.devrieze@radboudumc.nl.

^# Contributed equally.

PMID: 40037841
PMCID: PMC12047242
DOI: 10.1101/gr.280060.124

Deciphering the largest disease-associated transcript isoforms in the human neural retina with advanced long-read sequencing approaches

Merel Stemerdink et al. Genome Res. 2025.

. 2025 Apr 14;35(4):725-739.

doi: 10.1101/gr.280060.124.

Authors

Affiliations

¹ Department of Otorhinolaryngology, Radboud University Medical Center, Nijmegen 6525 GA, The Netherlands.
² Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen 6525 GA, The Netherlands.
³ Department of Medical BioSciences, Radboud University Medical Center, Nijmegen 6525 GA, The Netherlands.
⁴ Department of Human Genetics, Radboud University Medical Center, Nijmegen 6525 GA, The Netherlands.
⁵ Fondazione Banca degli Occhi del Veneto, Zelarino, Venice 30174, Italy.
⁶ Center for Medical Genetics, Ghent University Hospital, Ghent 9000, Belgium.
⁷ Department of Biomolecular Medicine, Ghent University, Ghent 9000, Belgium.
⁸ Department of Pharmaceutics, Ghent University, Ghent 9000, Belgium.
⁹ Department of Internal Medicine and Radboud Center for Infectious Diseases (RCI), Radboud University Medical Center, Nijmegen 6525 GA, The Netherlands.
¹⁰ Department of Otorhinolaryngology, Radboud University Medical Center, Nijmegen 6525 GA, The Netherlands; erik.devrieze@radboudumc.nl.

^# Contributed equally.

PMID: 40037841
PMCID: PMC12047242
DOI: 10.1101/gr.280060.124

Abstract

Sequencing technologies have long limited the comprehensive investigation of large transcripts associated with inherited retinal diseases (IRDs) like Usher syndrome, which involves 11 associated genes with transcripts up to 19.6 kb. To address this, we used PacBio long-read mRNA isoform sequencing (Iso-Seq) following standard library preparation and an optimized workflow to enrich for long transcripts in the human neural retina. While our workflow achieved sequencing of transcripts up to 15 kb, this was insufficient for Usher syndrome-associated genes USH2A and ADGRV1, with transcripts of 18.9 kb and 19.6 kb, respectively. To overcome this, we employed the Samplix Xdrop System for indirect target enrichment of cDNA, a technique typically used for genomic DNA capture. This method facilitated the successful capture and sequencing of ADGRV1 transcripts as well as full-length 18.9 kb USH2A transcripts. By combining algorithmic analysis with detailed manual curation of sequenced reads, we identified novel isoforms characterized by an alternative 5' transcription start site, the inclusion of previously unannotated exons, or alternative splicing events across the 11 Usher syndrome-associated genes. These findings have significant implications for genetic diagnostics and therapeutic development. The analysis applied here on Usher syndrome-associated transcripts exemplifies a valuable approach that can be extended to explore the transcriptomic complexity of other IRD-associated genes in the complete transcriptome data set generated within this study. Additionally, we demonstrate the adaptability of the Samplix Xdrop System for capturing cDNA, and the optimized methodologies described can be expanded to facilitate the enrichment of large transcripts from various tissues of interest.

PubMed Disclaimer

Figures

**Figure 1.**
Overview of the sequencing workflows and subsequent analyses. The figure illustrates the sequencing workflows and subsequent analysis performed on RNA extracted from three human neural retina samples. The workflows include PacBio long-read mRNA Iso-Seq using both the standard and an optimized long transcript workflow. The analysis was carried out in three distinct data sets: data set 1 comprised the standard workflow samples analyzed with IsoQuant, data set 2 involved a combined analysis of the reads obtained with standard and optimized long transcript workflows, and data set 3 focused solely on reads obtained with from the long transcript workflow. Additionally, an “indirect targeted enrichment” of transcripts for the *USH2A* and *ADGRV1* genes was achieved using the Samplix Xdrop System, followed by PacBio long-read sequencing and cDNA analysis. All reads mapping to Usher syndrome–associated transcript isoforms were manually curated using BAM files in the IGV. An independent ONT long-read sequencing data set of three independent retina samples was used to validate findings.

**Figure 2.**
Exploring the Usher syndrome–associated transcript isoform landscape in the human neural retina using PacBio long-read mRNA Iso-Seq. (A) The size distribution of sequenced transcripts derived from the standard workflow (blue) and optimized long workflow (red) data sets. For the standard workflow data set, the mean size distribution across the three sequenced samples is depicted ± standard deviation (SD). (B) Comparison of Usher syndrome–associated transcript coverage between the standard workflow and optimized long workflow data set. The Usher genes are arranged in order from smallest to largest coding sequence, with the coding sequence length of the largest known transcript for each gene provided in brackets. For the standard workflow data set, the mean ± SD transcript length across the three sequenced samples is presented. (C) Quantification of the percentage of reads displaying intron retention in standard workflow samples 1–3 (mean of 3 samples ± SD) versus long workflow sample 4.

**Figure 3.**
*MYO7A* transcripts identified by IsoQuant analysis compared to known isoforms from the literature. (A) The GENCODE reference transcript is depicted at the *top* in green, followed by the known human *MYO7A* transcript isoforms in blue (Gilmore et al. 2023) and the murine isoforms in gray (Li et al. 2020). The *MYO7A* IsoQuant transcripts are depicted in red. The light green, blue, gray, and red colors indicate the UTR and the dark green, blue, gray, and red colors indicate the open reading frame (ORF) of each transcript. Differences between the IsoQuant transcript isoforms and the GENCODE reference transcript are highlighted in gray boxes. (B) Relative expression of *MYO7A* isoforms based on literature in either the retina or the cochlea. (C) The TPM (based on data set 1) for each IsoQuant isoform are presented for the three individual samples. (D) The predicted 2D protein domain architecture of the MYO7A protein isoforms with the canonical 5′ start and the alternative 5′ start from transcript20052.Chr11.nic. The bar *below* the 2D protein structures displays the amino acid positions. (IQ) isoleucine–glutamine motif, (CC1) coiled-coil domain, (LowC) low complexity region, (MyTH4) myosin tail homology 4, (SH3) SRC homology 3 domain. (E) AlphaFold2 3D protein predictions of the MYO7A protein isoforms, modeled from the 5′ start to the end of the Myosin motor head domain. (F) RT-qPCR analysis of the relative expression of the *MYO7A* canonical 5′ start site, the alternative 5′ start, and the 3′ end is shown. The locations of the primers for this RT-qPCR are indicated with the arrows on *top* of the IsoQuant isoforms in Figure 3A.

**Figure 4.**
*WHRN* transcript isoforms identified by IsoQuant analysis compared to known isoforms from the literature. (A) The GENCODE reference transcript is depicted at the *top* in green, followed by human *WHRN* transcript isoforms from literature in blue (van Wijk et al. 2006) and the murine transcript isoforms in gray (Mburu et al. 2003; Belyantseva et al. 2005; Ebrahim et al. 2016). The *WHRN* IsoQuant transcripts are depicted in red. The light green, blue, gray, and red colors indicate the UTR and the dark green, blue, gray, and red colors indicate the ORF of each transcript. Differences between the IsoQuant transcripts and the GENCODE reference transcript are highlighted in gray boxes. (B) Relative expression of *WHRN* isoforms based on literature in either the retina or the cochlea. (C) The TPM (based on data set 1) for each IsoQuant transcript isoform are presented for the three individual samples. (D) The predicted 2D protein domain architecture of the encoded WHRN protein isoforms. Light blue and green boxes highlight the difference between the WHRN reference isoform and the protein isoform encoded by exon 7B-containing transcript13724.Chr9.nic. (E) AlphaFold2 3D protein predictions of two WHRN isoforms; reference isoform ENST00000362057.4 in green and transcript13724.Chr9.nic in red, with the alpha helix encoded by the novel exon 7B highlighted in blue. (F) RT-qPCR analysis of the expression of the *WHRN* transcripts containing exon 7B, and *WHRN* transcripts with intron 4 retention, relative to all *WHRN* transcripts containing exons 8–9.

**Figure 5.**
*USH2A* transcript isoforms were identified by IsoQuant analysis, manual curation, and Samplix Xdrop targeted enrichment. (A) The GENCODE reference transcript is depicted at the *top* in green, followed by the known human *USH2A* transcript isoforms in blue (van Wijk et al. 2004). The *USH2A* IsoQuant transcripts are depicted in red. The light green, blue, and red colors indicate the UTR and the dark green, blue, and red colors indicate the ORF of each transcript. Differences between the IsoQuant transcript isoforms and the GENCODE reference transcript are highlighted in gray boxes. (B) Relative expression of *USH2A* isoforms based on literature in either the retina or the cochlea. (C) The TPM (based on data set 1) for each IsoQuant transcript are presented for the three individual samples. (D) Proposed *USH2A* transcript isoforms based on manual curation and Samplix Xdrop targeted enrichment. The GENCODE reference transcript is depicted in green, followed by the proposed *USH2A* transcript isoforms and events based on manual curation of BAM files using the IGV in red, and the proposed transcript isoform following the Samplix Xdrop targeted enrichment in orange. The light green, red, and orange colors indicate the UTR and the dark green, red, and orange colors indicate the ORF of each transcript. Differences between the proposed transcript isoforms and the GENCODE reference transcript are highlighted in gray boxes. The overview of sporadic incorporation of cryptic exons indicates the presence of PE8 and PE20 as previously described by Reurink et al. (2023). Additionally, locations, where cryptic exons are occasionally incorporated at sites that are not yet associated with deep-intronic pathogenic variants, are indicated with black arrows.

**Figure 6.**
*ADGRV1* proposed transcript isoforms from manual curation and Samplix Xdrop targeted enrichment. The GENCODE reference transcript is depicted at the *top* in green, followed by the *ADGRV1* proposed retinal transcript isoforms and events based on manual curation of BAM files using the IGV in red, and proposed transcript isoforms following the Samplix Xdrop targeted enrichment in orange. The light green, red, and orange colors indicate the UTR and the dark green, red, and orange colors indicate the ORF of each transcript. Differences between the proposed transcript isoforms and the GENCODE reference transcript are highlighted in a gray box.

See this image and copyright information in PMC

References

1. Abad-Morales V, Navarro R, Burés-Jelstrup A, Pomares E. 2020. Identification of a novel homozygous ARSG mutation as the second cause of Usher syndrome type 4. Am J Ophthalmol Case Rep 19: 100736. 10.1016/j.ajoc.2020.100736 - DOI - PMC - PubMed
1. Adato A, Vreugde S, Joensuu T, Avidan N, Hamalainen R, Belenkiy O, Olender T, Bonne-Tamir B, Ben-Asher E, Espinos C, et al. 2002. USH3A transcripts encode clarin-1, a four-transmembrane-domain protein with a possible role in sensory synapses. Eur J Hum Genet 10: 339–350. 10.1038/sj.ejhg.5200831 - DOI - PubMed
1. Ahmed ZM, Riazuddin S, Bernstein SL, Ahmed Z, Khan S, Griffith AJ, Morell RJ, Friedman TB, Riazuddin S, Wilcox ER. 2001. Mutations of the protocadherin gene PCDH15 cause Usher syndrome type 1F. Am J Hum Genet 69: 25–34. 10.1086/321277 - DOI - PMC - PubMed
1. Belyantseva IA, Boger ET, Naz S, Frolenkov GI, Sellers JR, Ahmed ZM, Griffith AJ, Friedman TB. 2005. Myosin-XVa is required for tip localization of whirlin and differential elongation of hair-cell stereocilia. Nat Cell Biol 7: 148–156. 10.1038/ncb1219 - DOI - PubMed
1. Bolz H, von Brederlow B, Ramírez A, Bryda EC, Kutsche K, Nothwang HG, Seeliger M, del CSCM, Vila MC, Molina OP, et al. 2001. Mutation of CDH23, encoding a new member of the cadherin gene family, causes Usher syndrome type 1D. Nat Genet 27: 108–112. 10.1038/83667 - DOI - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- PubMed Central
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Deciphering the largest disease-associated transcript isoforms in the human neural retina with advanced long-read sequencing approaches

Affiliations

Deciphering the largest disease-associated transcript isoforms in the human neural retina with advanced long-read sequencing approaches

Authors

Affiliations

Abstract

Figures

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Medical