Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Aug 2:8:711733.
doi: 10.3389/fmolb.2021.711733. eCollection 2021.

Isoform Age - Splice Isoform Profiling Using Long-Read Technologies

Affiliations
Review

Isoform Age - Splice Isoform Profiling Using Long-Read Technologies

Ricardo De Paoli-Iseppi et al. Front Mol Biosci. .

Erratum in

Abstract

Alternative splicing (AS) of RNA is a key mechanism that results in the expression of multiple transcript isoforms from single genes and leads to an increase in the complexity of both the transcriptome and proteome. Regulation of AS is critical for the correct functioning of many biological pathways, while disruption of AS can be directly pathogenic in diseases such as cancer or cause risk for complex disorders. Current short-read sequencing technologies achieve high read depth but are limited in their ability to resolve complex isoforms. In this review we examine how long-read sequencing (LRS) technologies can address this challenge by covering the entire RNA sequence in a single read and thereby distinguish isoform changes that could impact RNA regulation or protein function. Coupling LRS with technologies such as single cell sequencing, targeted sequencing and spatial transcriptomics is producing a rapidly expanding suite of technological approaches to profile alternative splicing at the isoform level with unprecedented detail. In addition, integrating LRS with genotype now allows the impact of genetic variation on isoform expression to be determined. Recent results demonstrate the potential of these techniques to elucidate the landscape of splicing, including in tissues such as the brain where AS is particularly prevalent. Finally, we also discuss how AS can impact protein function, potentially leading to novel therapeutic targets for a range of diseases.

Keywords: Oxford Nanopore Technologies nanopore sequencing; PacBio; alternative splicing; isoform; long-read sequencing; single cell sequencing; spatial transcriptomics; targeted RNA sequencing.

PubMed Disclaimer

Conflict of interest statement

RP, JG and MC have received support from ONT to present their findings at scientific conferences. ONT played no role in study design, execution, or publication.

Figures

FIGURE 1
FIGURE 1
Long-read methods for profiling isoforms. (A) An overview of available long-read methodologies for isoform characterisation. A variety of sample inputs (orange boxes) can be used with different sequencing methods (dashed edges) to answer a wide range of experimental questions. Intermediate experiment steps are shown with a bold outline. RNA from bulk samples can be either sequenced directly (ONT direct RNA) or reverse transcribed to cDNA, while RNA from single cells currently needs to be processed into cDNA before sequencing. Once present as cDNA, samples can be sequenced directly without PCR (unamplified cDNA) or after whole-transcriptome PCR amplification (PCR-cDNA) depending on starting amounts. cDNA can also serve as input for target enrichment techniques including PCR for specific gene isoforms of interest (amplicon sequencing) or capture of target cDNAs for sequencing (CaptureSeq). Long-read spatial transcriptomics starts with tissue sections placed on slides covered with spatially barcoded oligonucleotides. Cellular RNA is captured by nearby barcodes allowing generation of spatially barcoded cDNAs. Sequencing identifies spatial expression patterns of RNA isoforms within the tissue such as the hypothetical isoform A. (B) Selected examples where long-read sequencing methods have been used to examine isoforms and alternative splicing in a range of human diseases. AR, androgen receptor; CLL, chronic lymphocytic leukemia; PacBio, Pacific Biosciences; ONT, Oxford Nanopore Technologies.
FIGURE 2
FIGURE 2
Long-read sequencing methods and data generated. (A) Simplified pathways describing three long-read sequencing methods: amplicon sequencing, CaptureSeq and direct RNA sequencing. Amplicon sequencing involves cDNA synthesis followed by PCR of known and novel expressed isoforms from target genes (arrows indicate locations of forward and reverse primers). Sample barcodes can additionally be used for multiplexing before sequencing. CaptureSeq is performed on cDNA and utilises pools of oligonucleotide probes (capture probes). Probes (red) hybridise to isoforms from target genes (blue), which can then be purified, creating a sequencing library highly enriched for target isoforms. Direct RNA sequencing requires purified RNA and commonly includes the optional preparation steps to purify polyA RNA (to remove ribosomal RNA) and perform cDNA synthesis (to break-up RNA secondary structures) respectively. Only the RNA strand is subsequently sequenced. (B) Schematic of isoform information generated by whole-transcriptome direct RNA sequencing vs gene-targeted amplicon sequencing of cDNA. Image from the UCSC Genome Browser of the schizophrenia risk gene GATAD2A (Ripke et al., 2014), showing GENCODE annotations compared to nanopore direct RNA (middle) and amplicon (bottom) reads collapsed into high-confidence isoforms by FLAIR (Gleeson et al., 2020; Tang et al., 2020). Forward and reverse primers for amplicon sequencing are indicated by blue arrow heads. Transcriptional start site (TSS) shown by black arrow. An alternate start site (pink box) and 3′UTR variants are captured by direct RNA sequencing of the SH-SY5Y cell line. At least five supporting reads were required to identify high-confidence isoforms from direct RNA sequencing. A large number of additional isoforms are supported by amplicon sequencing using a more stringent threshold of 500 supporting reads. CDS, coding sequence; BC, barcode.
FIGURE 3
FIGURE 3
Isoform characterisation in health and disease. (A) Nanopore amplicon sequencing of CACNA1C in post-mortem human brain found an isoform switch in cerebellum (purple) for exon 30 compared to non-cerebellar tissues. An example of novel splicing impacts on the N-terminus and an extracellular region (IS5-6 linker) is also shown for the first CACNA1C protein domain (Clark et al., 2020). NA-CBM, non-cerebellar tissue; CBM, cerebellum; aa, amino acid. (B) Combined short and long-read amplicon sequencing of NRXN1α in PFC and NRNX1 mutant and wildtype hiPSCs characterised expressed isoforms. Overexpression of mutant isoforms in wildtype hiPSCs or wildtype isoforms in mutant iPSCs identified genotype dependent impacts on neuronal activity. Red box indicates location of patient specific NRXN1α +/- deletions. Dashed arrow indicates exon skipping from exons 20 to 24 (Flaherty et al., 2019). PFC, prefrontal cortex; hiPSC, human induced pluripotent stem cell; TSS, transcriptional start site. (C) Single cell isoform RNA sequencing (ScISOrSeq) from mouse prefrontal cortex and hippocampus identified cell type signatures, e.g., exon exclusion (blue box) and tissue specific signatures e.g. exon inclusion in hippocampus (orange box) (Joglekar et al., 2021). PFC, prefrontal cortex; HIPP, hippocampus. (D) Repertoire and Gene Expression by sequencing (RAGE-Seq) characterised full-length antigen receptor sequences for T and B cells from a primary tumour (breast) and its sentinel lymph node. Clonal lineages and clonal expansions were identified, as well as differential expression between clonally expanded T or B cells from paired tumour and lymph node samples (Singh et al., 2019). TNBC, triple negative breast cancer; SLN, sentinel lymph node; TCR, T cell receptor; BCR, B cell receptor; SC-T, shared clonotype tumour; SC-L, shared clonotype lymph node; ONT, Oxford Nanopore Technologies; PacBio, Pacific Biosciences.

Similar articles

Cited by

References

    1. Afik S., Yates K. B., Bi K., Darko S., Godec J., Gerdemann U., et al. (2017). Targeted Reconstruction of T Cell Receptor Sequence from Single Cell RNA-Seq Links CDR3 Length to T Cell Differentiation State. Nucleic Acids Res. 45, e148. 10.1093/nar/gkx615 - DOI - PMC - PubMed
    1. Amarasinghe S. L., Su S., Dong X., Zappia L., Ritchie M. E., Gouil Q. (2020). Opportunities and Challenges in Long-Read Sequencing Data Analysis. Genome Biol. 21, 30–16. 10.1186/s13059-020-1935-5 - DOI - PMC - PubMed
    1. Ambardar S., Gupta R., Trakroo D., Lal R., Vakhlu J. (2016). High Throughput Sequencing: An Overview of Sequencing Chemistry. Indian J. Microbiol. 56, 394–404. 10.1007/s12088-016-0606-4 - DOI - PMC - PubMed
    1. Anvar S. Y., Allard G., Tseng E., Sheynkman G. M., de Klerk E., Vermaat M., et al. (2018). Full-length mRNA Sequencing Uncovers a Widespread Coupling between Transcription Initiation and mRNA Processing. Genome Biol. 19, 46. 10.1186/s13059-018-1418-0 - DOI - PMC - PubMed
    1. Asnani M., Hayer K. E., Naqvi A. S., Zheng S., Yang S. Y., Oldridge D., et al. (2020). Retention of CD19 Intron 2 Contributes to CART-19 Resistance in Leukemias with Subclonal Frameshift Mutations in CD19. Leukemia 34, 1202–1207. 10.1038/s41375-019-0580-z - DOI - PMC - PubMed