Identification of Protein Isoforms Using Reference Databases Built from Long and Short Read RNA-Sequencing
- PMID: 35612954
- DOI: 10.1021/acs.jproteome.1c00968
Identification of Protein Isoforms Using Reference Databases Built from Long and Short Read RNA-Sequencing
Abstract
Alternative splicing can lead to distinct protein isoforms. These can have different functions in specific cells and tissues or in different developmental stages. In this study, we explored whether transcripts assembled from long read, nanopore-based, direct RNA-sequencing (RNA-seq) could improve the identification of protein isoforms in human K562 cells. By comparing with Illumina-based short read RNA-seq, we showed that a large proportion of Ensembl transcripts (5949/14,326) and genes expressing alternatively spliced transcripts (486/2981) identified with long direct reads were missed by short paired-end reads. By co-analyzing proteomic and transcriptomic data, we also showed that some peptides (826/35,976), proteins (262/3215), and protein isoforms arising from distinct transcript variants (574/1212) identified with isoform-specific peptides via custom long-read-based databases were missed in Illumina-derived databases. Finally, we generated unequivocal peptide evidence for a set of protein isoforms and showed that long read, direct RNA-seq allows the discovery of novel protein isoforms not already in reference databases or custom databases built from short read RNA-seq data. Our analysis highlights the benefits of long read RNA-seq data in the generation of reference databases to increase tandem mass spectrometry (MS/MS) identification of protein isoforms.
Keywords: Illumina; MS/MS; Oxford Nanopore Technology; RNA-seq; alternative splicing; direct RNA-sequencing; long read RNA sequencing; protein isoform; proteogenomics.
Similar articles
-
Enhanced protein isoform characterization through long-read proteogenomics.Genome Biol. 2022 Mar 3;23(1):69. doi: 10.1186/s13059-022-02624-y. Genome Biol. 2022. PMID: 35241129 Free PMC article.
-
Transcript Profiling Using Long-Read Sequencing Technologies.Methods Mol Biol. 2018;1783:121-147. doi: 10.1007/978-1-4939-7834-2_6. Methods Mol Biol. 2018. PMID: 29767360
-
Full-length isoform concatenation sequencing to resolve cancer transcriptome complexity.BMC Genomics. 2024 Jan 29;25(1):122. doi: 10.1186/s12864-024-10021-x. BMC Genomics. 2024. PMID: 38287261 Free PMC article.
-
Identification of Splice Variants and Isoforms in Transcriptomics and Proteomics.Annu Rev Biomed Data Sci. 2023 Aug 10;6:357-376. doi: 10.1146/annurev-biodatasci-020722-044021. Annu Rev Biomed Data Sci. 2023. PMID: 37561601 Free PMC article. Review.
-
Understanding isoform expression by pairing long-read sequencing with single-cell and spatial transcriptomics.Genome Res. 2024 Nov 20;34(11):1735-1746. doi: 10.1101/gr.279640.124. Genome Res. 2024. PMID: 39567235 Free PMC article. Review.
Cited by
-
SUsPECT: a pipeline for variant effect prediction based on custom long-read transcriptomes for improved clinical variant annotation.BMC Genomics. 2023 Jun 6;24(1):305. doi: 10.1186/s12864-023-09391-5. BMC Genomics. 2023. PMID: 37280537 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources