. 2022 Jul 1;21(7):1628-1639.

doi: 10.1021/acs.jproteome.1c00968. Epub 2022 May 25.

Identification of Protein Isoforms Using Reference Databases Built from Long and Short Read RNA-Sequencing

Aidan P Tay^{1

2

3}, Joshua J Hamey¹, Gabriella E Martyn¹, Laurence O W Wilson^{2

3}, Marc R Wilkins¹

Affiliations

¹ School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, New South Wales 2052, Australia.
² Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation, Sydney, New South Wales 2113, Australia.
³ Applied Biosciences, Macquarie University, Sydney, New South Wales 2109, Australia.

PMID: 35612954
DOI: 10.1021/acs.jproteome.1c00968

Identification of Protein Isoforms Using Reference Databases Built from Long and Short Read RNA-Sequencing

Aidan P Tay et al. J Proteome Res. 2022.

. 2022 Jul 1;21(7):1628-1639.

doi: 10.1021/acs.jproteome.1c00968. Epub 2022 May 25.

Authors

Aidan P Tay^{1

2

3}, Joshua J Hamey¹, Gabriella E Martyn¹, Laurence O W Wilson^{2

3}, Marc R Wilkins¹

Affiliations

¹ School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, New South Wales 2052, Australia.
² Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation, Sydney, New South Wales 2113, Australia.
³ Applied Biosciences, Macquarie University, Sydney, New South Wales 2109, Australia.

PMID: 35612954
DOI: 10.1021/acs.jproteome.1c00968

Abstract

Alternative splicing can lead to distinct protein isoforms. These can have different functions in specific cells and tissues or in different developmental stages. In this study, we explored whether transcripts assembled from long read, nanopore-based, direct RNA-sequencing (RNA-seq) could improve the identification of protein isoforms in human K562 cells. By comparing with Illumina-based short read RNA-seq, we showed that a large proportion of Ensembl transcripts (5949/14,326) and genes expressing alternatively spliced transcripts (486/2981) identified with long direct reads were missed by short paired-end reads. By co-analyzing proteomic and transcriptomic data, we also showed that some peptides (826/35,976), proteins (262/3215), and protein isoforms arising from distinct transcript variants (574/1212) identified with isoform-specific peptides via custom long-read-based databases were missed in Illumina-derived databases. Finally, we generated unequivocal peptide evidence for a set of protein isoforms and showed that long read, direct RNA-seq allows the discovery of novel protein isoforms not already in reference databases or custom databases built from short read RNA-seq data. Our analysis highlights the benefits of long read RNA-seq data in the generation of reference databases to increase tandem mass spectrometry (MS/MS) identification of protein isoforms.

Keywords: Illumina; MS/MS; Oxford Nanopore Technology; RNA-seq; alternative splicing; direct RNA-sequencing; long read RNA sequencing; protein isoform; proteogenomics.

PubMed Disclaimer

Cited by

SUsPECT: a pipeline for variant effect prediction based on custom long-read transcriptomes for improved clinical variant annotation.
Salz R, Saraiva-Agostinho N, Vorsteveld E, van der Made CI, Kersten S, Stemerdink M, Allen J, Volders PJ, Hunt SE, Hoischen A, 't Hoen PAC. Salz R, et al. BMC Genomics. 2023 Jun 6;24(1):305. doi: 10.1186/s12864-023-09391-5. BMC Genomics. 2023. PMID: 37280537 Free PMC article.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- American Chemical Society

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Identification of Protein Isoforms Using Reference Databases Built from Long and Short Read RNA-Sequencing

Affiliations

Identification of Protein Isoforms Using Reference Databases Built from Long and Short Read RNA-Sequencing

Authors

Affiliations

Abstract

Similar articles

Cited by

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Abstract

Similar articles

Cited by

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources