Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Nov 27;14(1):7780.
doi: 10.1038/s41467-023-43387-9.

Detection of isoforms and genomic alterations by high-throughput full-length single-cell RNA sequencing in ovarian cancer

Collaborators, Affiliations

Detection of isoforms and genomic alterations by high-throughput full-length single-cell RNA sequencing in ovarian cancer

Arthur Dondi et al. Nat Commun. .

Abstract

Understanding the complex background of cancer requires genotype-phenotype information in single-cell resolution. Here, we perform long-read single-cell RNA sequencing (scRNA-seq) on clinical samples from three ovarian cancer patients presenting with omental metastasis and increase the PacBio sequencing depth to 12,000 reads per cell. Our approach captures 152,000 isoforms, of which over 52,000 were not previously reported. Isoform-level analysis accounting for non-coding isoforms reveals 20% overestimation of protein-coding gene expression on average. We also detect cell type-specific isoform and poly-adenylation site usage in tumor and mesothelial cells, and find that mesothelial cells transition into cancer-associated fibroblasts in the metastasis, partly through the TGF-β/miR-29/Collagen axis. Furthermore, we identify gene fusions, including an experimentally validated IGF2BP2::TESPA1 fusion, which is misclassified as high TESPA1 expression in matched short-read data, and call mutations confirmed by targeted NGS cancer gene panel results. With these findings, we envision long-read scRNA-seq to become increasingly relevant in oncology and personalized medicine.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Study design and long-read data overview.
a Schematic of freshly processed HGSOC omentum metastases and patient-matched tumor-free distal omentum tissue biopsies, scRNA-seq. b Definition of SQANTI-defined isoform structural categories. c Proportions of isoform structural categories detected in merged metastasis and distal omentum samples. Percentage and total number of isoforms per category are indicated. d Proportions of unique reads attributed to isoforms detected in (c). Percentage and total number of UMIs per category are indicated. e Percentage of isoforms for which transcription start site is supported by CAGE (FANTOM5) data and transcription termination site is supported by polyadenylation (PolyASite) data, per isoform structural categories. “GENCODE.all” indicates all protein-coding isoforms in the GENCODE database, “GENCODE.FL” is a subset of ‘GENCODE.all’ containing only isoforms tagged as full-length, and “GENCODE.MANE” is a hand-curated subset of canonical transcripts, one per human protein-coding locus. f GENCODE-defined biotype composition of novel isoforms. g Biotype composition of the GENCODE database.
Fig. 2
Fig. 2. Clustering and cell-type-specific isoform distribution.
a Cohort UMAP embeddings by data types and automatic cell-type annotation. Top and bottom rows: cell-type labels based on short- and long-read data, respectively. Left column: embedding on short-read data—gene level, middle column: embedding on long-read data—gene level, right column: embedding on long-read data—isoform level. b Jaccard distance of cell populations in different UMAP embeddings: short reads—gene level versus long reads— gene level (left), short reads—gene level versus long reads—isoform level (middle), long reads—gene level versus long reads—isoform level (right). Long reads—gene-level UMAP cohort visualizations of cells with at least one germline (c) or somatic (d) mutation also found in targeted NGS panel data of matched patient samples. Germline variants are variants detected in healthy omentum distal samples. e SQANTI-defined structural category normalized distribution of isoforms detected per cell type (number of isoforms displayed in white).
Fig. 3
Fig. 3. Epithelial-to-mesenchymal transition in the tumor microenvironment.
a Zoom of UMAP embeddings of the cohorts’ long-read–gene-level data (Fig. 2a, middle column) highlighting tumor and stromal (mesothelial and fibroblast) cells, colored by biopsy tissue type (left) and EMT gene set signal (right). b Volcano plot of genes with APA in mesothelial cells. Genes have either a lengthened (red) or shortened (blue) 3’UTR in TME compared to distal mesothelial cells. Differentially lengthened or shortened genes targeted by miR-29 are colored in green. Genes with -log10(p-adjusted) >10 and |Fraction Change| >0.4 are annotated. APA statistical test is described in “Methods” (c) IGV view of 3’UTR raw coverage of COL1A2, COL6A1, COL3A1, and COL5A2 in tissue cell types. On the top left between brackets, the coverage range is displayed throughout each condition. In blue, Ensembl canonical 3’UTR, and for each gene, distal (d) and proximal (p) APA sites are annotated. d Log fold-change expression between TME and distal mesothelial cells of lengthened genes targeted (+, green, n = 9) or not targeted (−, red, n = 12) by miR-29, and shortened genes (blue, n = 19). Boxes display the first to third quartile with median as horizontal line, whiskers encompass 1.5 times the interquartile range, and data beyond that threshold is indicated as outliers. P values were calculated using a two-sided Student’s t-test between the fold-change means. e Cohort UMAP embedding long-read data—gene level, colored by gene set signal of ECM-related genes targeted by miR-29. f ScisorWiz representation of COL1A1 isoforms. Colored areas are exons, whitespace areas are intronic space, not drawn to scale, and each horizontal line represents a single read colored according to cell types. Dashed boxes highlight the use of the canonical 3’ UTR in TME fibroblasts and mesothelial cells, while distal mesothelial cells use an earlier 3’ exon termination.
Fig. 4
Fig. 4. Differential isoforms and 3’UTR lengths in cancer.
a Number of genes with change in isoform usage between HGSOC and all distal cells. In orange, genes with differentially expressed isoforms and a change in relative isoform abundance >20% (>50% in green). In blue, genes with no differentially expressed isoforms or change in relative isoform abundance <20%. b Alluvial plot of biotypes of most expressed isoforms in HGSOC and distal cells in genes containing an isoform change >20% (n = 960). Each vein represents the conversion of one biotype to another. For example, in seven genes, the most expressed isoform in HGSOC cells is protein-coding, while distal cells’ one is non-protein-coding. c Alluvial plot of biotypes of most expressed isoforms in HGSOC and distal cells in genes containing an isoform switch (>50% change, n = 39). d ScisorWiz representation of isoforms in IGF1, each horizontal line represents a single isoform colored according to cell types. Exons are numbered according to the Gencode reference, Class I and II isoforms are isoforms with starting exons 1 and 2, respectively. e Top: IGV view of OAS1 expression in patients. Patient 3 has low p46 expression compared to others. Bottom: zoom on the last exon of isoform p46, where all patients have at least one mutated A allele in the splice acceptor site. f Volcano plot of genes with APA in cancer versus distal cells. Genes have either a lengthened (red) or shortened (blue) 3’UTR in cancer cells compared to all distal cells. Differentially lengthened or shortened genes targeted by miR-29 are colored in green. Genes with -log10(P-adjusted) >10 and |fraction change| >0.5 are annotated. APA statistical test is described in “Methods”.
Fig. 5
Fig. 5. Tumor and patient-specific detection of a IGF2BP2::TESPA1 gene fusion.
a Overview of wt IGF2BP2, wt TESPA1, and IGF2BP2::TESPA1 gene fusion with exon structure. b Overview of wt IGF2BP2, wt TESPA1, and fusion protein with protein domains. RRM RNA-recognition motif, KH heterogeneous nuclear ribonucleoprotein K-homology domain, KRAP_IP3R_bind Ki-ras-induced actin-interacting protein-IP3R- interacting domain. c Violin plot showing patient- and tumor-specific IGF2BP2::TESPA1 fusion transcript detection in Patient 2. d UMI count in fusion-containing (n = 173) versus -lacking (n = 32) Patient 2 tumor cells. Boxes display the first to third quartile with median as horizontal line, whiskers encompass 1.5 times the interquartile range, and data beyond that threshold is indicated as outliers. e UMAP embeddings of the cohorts’ short-read data. Cells are colored if they express IGF2BP2 (red), TESPA1 (green), or both (yellow) in short- (left panel) or long reads (right panel). f Raw expression of TESPA1 (left) and IGF2BP2 (right) in short- (top) or long reads (bottom), by sample and cell type. g IGV view of short reads (top), non-fusion long reads (middle), and fusion long reads (bottom) mapping to the 3’UTR of TESPA1. Non-fusion reads are either triple hSNP-mutated or non-mutated, while fusion and short reads are only triple hSNP-mutated.
Fig. 6
Fig. 6. IGF2BP2::TESPA1 fusion breakpoint validation in bulk and scDNA.
a Genotyping PCR on genomic DNA isolated from matched patient samples using gene-specific primers for IGF2BP2::TESPA1 genomic breakpoint (top), wt TESPA1 (middle) and wt IGF2BP2 (bottom). n = 2 patients, 4 samples per patient, depending on biological material available. Source images are provided as a Source Data file. b Copy number values per subclone in Patient 2 scDNA-seq data. Sublone 0 has multiple copy number alterations, indicative of cancer, while Subclone 1 is copy number neutral, presumably non-cancer. c IGV view of scDNA reads aligning unambiguously to the IGF2BP2::TESPA1 genomic breakpoint (top), wt TESPA1 (middle), or wt IGF2BP2 (bottom). In red, reads from Subclone 0 cells (cancer); in blue, reads from Subclone 1 cells (non-cancer).

References

    1. Garraway LA, Lander ES. Lessons from the cancer genome. Cell. 2013;153:17–37. doi: 10.1016/j.cell.2013.03.002. - DOI - PubMed
    1. Hanahan D. Hallmarks of cancer: new dimensions. Cancer Discov. 2022;12:31–46. doi: 10.1158/2159-8290.CD-21-1059. - DOI - PubMed
    1. Hertzman Johansson C, Egyhazi Brage S. BRAF inhibitors in cancer therapy. Pharmacol. Ther. 2014;142:176–182. doi: 10.1016/j.pharmthera.2013.11.011. - DOI - PubMed
    1. Li, J. et al. A functional genomic approach to actionable gene fusions for precision oncology. Sci. Adv. 8, eabm2382 (2022). - PMC - PubMed
    1. Schumacher TN, Schreiber RD. Neoantigens in cancer immunotherapy. Science. 2015;348:69–74. doi: 10.1126/science.aaa4971. - DOI - PubMed

Publication types