. 2024 Aug 12;15(1):6916.

doi: 10.1038/s41467-024-51252-6.

Single-cell long-read targeted sequencing reveals transcriptional variation in ovarian cancer

Affiliations

¹ Department of Proteomic and Genomic Technologies, Genentech, South San Francisco, CA, USA.
² Department of Discovery Oncology, Genentech, South San Francisco, CA, USA.
³ Department of Oncology Bioinformatics, Genentech, South San Francisco, CA, USA.
⁴ Department of Proteomic and Genomic Technologies, Genentech, South San Francisco, CA, USA. modrusan.zora@gene.com.
⁵ Department of Proteomic and Genomic Technologies, Genentech, South San Francisco, CA, USA. stephenson.william@gene.com.

^# Contributed equally.

PMID: 39134520
PMCID: PMC11319652
DOI: 10.1038/s41467-024-51252-6

Single-cell long-read targeted sequencing reveals transcriptional variation in ovarian cancer

Ashley Byrne et al. Nat Commun. 2024.

. 2024 Aug 12;15(1):6916.

doi: 10.1038/s41467-024-51252-6.

Authors

Affiliations

¹ Department of Proteomic and Genomic Technologies, Genentech, South San Francisco, CA, USA.
² Department of Discovery Oncology, Genentech, South San Francisco, CA, USA.
³ Department of Oncology Bioinformatics, Genentech, South San Francisco, CA, USA.
⁴ Department of Proteomic and Genomic Technologies, Genentech, South San Francisco, CA, USA. modrusan.zora@gene.com.
⁵ Department of Proteomic and Genomic Technologies, Genentech, South San Francisco, CA, USA. stephenson.william@gene.com.

^# Contributed equally.

PMID: 39134520
PMCID: PMC11319652
DOI: 10.1038/s41467-024-51252-6

Abstract

Single-cell RNA sequencing predominantly employs short-read sequencing to characterize cell types, states and dynamics; however, it is inadequate for comprehensive characterization of RNA isoforms. Long-read sequencing technologies enable single-cell RNA isoform detection but are hampered by lower throughput and unintended sequencing of artifacts. Here we develop Single-cell Targeted Isoform Long-Read Sequencing (scTaILoR-seq), a hybridization capture method which targets over a thousand genes of interest, improving the median number of on-target transcripts per cell by 29-fold. We use scTaILoR-seq to identify and quantify RNA isoforms from ovarian cancer cell lines and primary tumors, yielding 10,796 single-cell transcriptomes. Using long-read variant calling we reveal associations of expressed single nucleotide variants (SNVs) with alternative transcript structures. Phasing of SNVs across transcripts enables the measurement of allelic imbalance within distinct cell populations. Overall, scTaILoR-seq is a long-read targeted RNA sequencing method and analytical framework for exploring transcriptional variation at single-cell resolution.

PubMed Disclaimer

Conflict of interest statement

The authors declare the following competing interests: All the authors are current or previous employees and shareholders of Roche/Genentech.

Figures

**Fig. 1. Overview of single-cell long-read targeted sequencing.**
Ovarian cell lines or dissociated tumor cells are processed using droplet-based single-cell RNA-seq 3’-Gene expression assay to obtain cDNA. Targeted enrichment is performed followed by nanopore sequencing, cell barcode (CB) and unique molecular identifier (UMI) assignment, and read alignment. Downstream analysis enables the measurement of isoforms, SNVs, allelic expression, TCR sequences and gene fusions at single-cell resolution. Figure Created with BioRender.com released under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International license (https://creativecommons.org/licenses/by-nc-nd/4.0/deed.en).

**Fig. 2. Targeted long-read single-cell cDNA sequencing optimization and scTaILoR-seq performance.**
a Schematic detailing library preparation methods (targeted, targeted+AM, and targeted+R2C2) tested for enrichment using long-read (LR) sequencing. b Complete reads (left), TSO-TSO artifacts (middle), and number of passed reads (QScore ≥7) (right) across library preparation methods. Targeted (n = 3 replicates), Targeted+AM (n = 2 replicates), Targeted+R2C2 (n = 2 replicates). Bars represent mean values across replicates. c Pseudobulk gene-level expression between short-read (SR) untargeted and scTaILoR-seq approach. CPM counts per million. d Metagene max-normalized coverage profiles for untargeted and targeted SR and LR sequencing approaches. e Pseudobulk gene expression correlation for scTaILoR-seq replicates. f Frequency distributions of read counts per gene for untargeted LR and scTaILoR-seq. g Number of genes and transcripts uniquely detected (single dot) or shared (‘joined’ dots) across untargeted LR sequencing and scTaILoR-seq. h Pseudobulk transcript-level expression between untargeted LR and scTaILoR-seq methods. i Number of observed on-target novel transcript models for untargeted LR sequencing and scTaILoR-seq. j Number of on-target fusions identified for untargeted LR sequencing and scTaILoR-seq. Source data are provided as a Source Data file.

**Fig. 3. Single-cell enrichment metrics and cell line-specific alternative splicing.**
a Comparative principal component analysis using ovarian cell line mixture (COV504, IGROV-1, and SK-OV-3). SR short-read, LR long-read. b Cell embeddings using the first three principal components for untargeted LR sequencing and scTaILoR-seq. c UMAP visualization of ovarian cell line mixture. d Comparison of on-target genes, UMIs, and transcripts per cell across untargeted and scTaILoR-seq library preparation methods.Violin plots represent kernel density estimates. Box and whisker plots represent the first and thrid quartiles (box), median (dot) and the minimum and maximum values (whiskers). LR untargeted (n = 2100) scTaILoR-seq (n = 2101). e Cluster level PARP2 isoform proportions and single-cell transcript UMAP visualization. Alternative 5’ splice site within exon 2 of PARP2 is indicated by the shaded pink rectangle in the transcript model. UMAP visualization diplays scaled expression. f Cluster level RTKN isoform proportions and single-cell transcript UMAP visualization Alternative 5’-UTR and first exon usage of RTKN is indicated by the shaded pink rectangle in the transcript model. UMAP visualization diplays scaled expression. Source data are provided as a Source Data file.

**Fig. 4. Profiling ovarian tumor cells with scTaILoR-seq.**
a UMAP visualization of dissociated tumor cells (n = 2482) from a HGSOC patient (P1). Scaled expression of cell-type-specific canonical marker genes are shown as additional UMAPs (right). b Scaled expression of the top-10 differentially expressed transcripts across coarse cell-type groupings. The colored horizontal bar at top corresponds to cell-type annotations in (a). c IL-32 transcript models for theta and beta isotypes and differential IL-32 isoform usage identified across CD8 + T cells and PDGFRa-/b+ Fibroblasts (two-sided Mann–Whitney U test). d Zoomed in UMAP of T cells showing successfully reconstructed TCRs. e Proportion of T cells with TCR chain assignments: no chain identified (N/A), one chain identified (α or β) or both chains identified (α & β). f Higher-order (>2 cells) clonotypes identified within T cells. The inner ring denotes the number of cells while the outer ring denotes individual clonotype frequency. g Projection of cells expressing *TP53* chr17:7674241_G > C (HGVS 17:g.7674241 G > C). h Correlation between single-cell MSK HGSOC geneset expression and PROGENy pathway activity scores. i MSK HGSOC Cancer.3 (JAK-STAT/NF-kB/TNF-active) and Cancer.6 (Hypoxia-active) geneset scores mapped to epithelial cell embedding. Source data are provided as a Source Data file.

**Fig. 5. Identification of SNV-associated differential transcript structures.**
a Workflow for detecting SNV-associated differential transcript structures: (i) Obtain reads from ovarian tumor epithelial cells (scTaILoR-seq), (ii) Determine SNVs using Clair3, (iii) Predict cryptic splice events using SpliceAI, (iv) Compute coverage divergence between reference (REF) or alternative (ALT) variant reads, (v) Identify differential transcript structures among SpliceAI hits using coverage divergence. b SNVs of genes exhibiting SpliceAI score above the threshold value of 0.1. Each SNV is colored by SpliceAI score and whether a hit also displays coverage divergence between REF and ALT. c Hierarchical clustering based on extent of divergence at transcript structural elements (CDS and UTR/Intron) of 44 “Hit and divergent” SNVs. d, e Plots for ELF3 and STAT1, respectively: normalized coverage tracks for REF and ALT and corresponding transcript model. Source data are provided as a Source Data file.

**Fig. 6. Variant phasing enables measurement of allelic imbalance at the single-cell level.**
a Phased (H1—blue, H2—green) single-molecule read tracks for *HLA-DRA*. b Allelic expression (H1/H2) and proportions of cells detected for epithelial and non-epithelial groups ranked by magnitude of imbalance. c, d Plots for *HLA-DRA* and *VEGFA*, respectively: UMAP visualization of haplotype expression with epithelial cells highlighted in red and non-epithelial cells highlighted in blue. Violin plots show significant cell-type-specific allelic imbalance (two-sided Mann–Whitney U test). Source data are provided as a Source Data file.

See this image and copyright information in PMC

References

1. Pan, Q., Shai, O., Lee, L. J., Frey, B. J. & Blencowe, B. J. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat. Genet.40, 1413–1415 (2008). 10.1038/ng.259 - DOI - PubMed
1. Jiang, W. & Chen, L. Alternative splicing: human disease and quantitative analysis from high-throughput sequencing. Comput. Struct. Biotechnol. J.19, 183–195 (2021). 10.1016/j.csbj.2020.12.009 - DOI - PMC - PubMed
1. Sterne-Weiler, T. & Sanford, J. R. Exon identity crisis: disease-causing mutations that disrupt the splicing code. Genome Biol.15, 201 (2014). 10.1186/gb4150 - DOI - PMC - PubMed
1. Steijger, T. et al. Assessment of transcript reconstruction methods for RNA-seq. Nat. Methods10, 1177–1184 (2013). 10.1038/nmeth.2714 - DOI - PMC - PubMed
1. Tilgner, H. et al. Accurate identification and analysis of human mRNA isoforms using deep long read sequencing. G33, 387–397 (2013). 10.1534/g3.112.004812 - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
- Nature Publishing Group
- PubMed Central
Other Literature Sources
- The Lens - Patent Citations Database
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Single-cell long-read targeted sequencing reveals transcriptional variation in ovarian cancer

Affiliations

Single-cell long-read targeted sequencing reveals transcriptional variation in ovarian cancer

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical