Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2010 Mar;10(6):1127-40.
doi: 10.1002/pmic.200900445.

Investigating protein isoforms via proteomics: a feasibility study

Affiliations
Comparative Study

Investigating protein isoforms via proteomics: a feasibility study

Paul Blakeley et al. Proteomics. 2010 Mar.

Abstract

Alternative splicing (AS) and processing of pre-messenger RNAs explains the discrepancy between the number of genes and proteome complexity in multicellular eukaryotic organisms. However, relatively few alternative protein isoforms have been experimentally identified, particularly at the protein level. In this study, we assess the ability of proteomics to inform on differently spliced protein isoforms in human and four other model eukaryotes. The number of Ensembl-annotated genes for which proteomic data exists that informs on AS exceeds 33% of the alternately spliced genes in the human and worm genomes. Examining AS in chicken via proteomics for the first time, we find support for over 600 AS genes. However, although peptide identifications support only a small fraction of alternative protein isoforms that are annotated in Ensembl, many more variants are amenable to proteomic identification. There remains a sizeable gap between these existing identifications (10-52% of AS genes) and those that are theoretically feasible (90-99%). We also compare annotations between Swiss-Prot and Ensembl, recommending use of both to maximize coverage of AS. We propose that targeted proteomic experiments using selected reactions and standards are essential to uncover further alternative isoforms and discuss the issues surrounding these strategies.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Exon and Peptide mapping nomenclature
Three different transcript models processed from the same pre-mRNA are shown, with introns represented by black lines connecting the exons. Exons are classified by whether are present in all transcripts (C=Constitutive), some transcripts (S=Semi-constitutive), or a single transcript only (U=unique). Peptides are initially classified by virtue of the number and type of exons that they cover, although this is not always fully informative of their status For example, peptide A spans a constitutive and unique exon, and is therefore “unique”, whilst peptide B is wholly contained within the unique exon 2, yet is “constitutive” as this sequence is also in-frame and contained within exon 3 present in the other two transcripts. Most peptides that cover unique exons are specific to a particular isoform.
Figure 2
Figure 2. Ensembl exon classification
The relative numbers of unique, semi-constitutive and constitutive exons are shown across the five genomes under study, distinguishing constitutive exons in genes which only express one isoform from those that don’t. Semi-constitutive exons are expressed in some but not all isoforms. The relative fraction of constitutive exons decreases broadly with organismal complexity. It should be noted that less than 47% of all genes exhibit AS (Table 1), which is broadly in line with the percentage of exons which are not constitutive.
Figure3
Figure3. Predicted and experimental peptides that span introns
The relative fractions of intron-spanning and wholly exon-internal tryptic peptides in five Ensembl proteomics is shown, for four sub categories: “all” refers to a complete digest of the proteome (removing only redundant peptide sequences which cannot be unambiguously placed on the genome), “Length” refers to a further filter where peptides <5 or >40 amino acids are removed, “proteotypic” refers to peptides that are additionally predicted to be observed by PeptideSieve, and “experimental” refers only to peptides with high quality corresponding experimental data (see Methods). The data in this plot includes peptides up to and including 1 missed cleavage. The total number of peptides in each set is display at the end of the row.
Figure 4
Figure 4. Relative classifications of peptide sets from metazoan proteomes in terms of detection of specific protein isoforms
Different peptide subsets from the five Ensembl genomes are mapped to the exon structure of their genes and classified as either unique, semi-constitutive or constitutive. The peptide sets are: “all” refers to a complete digest of the proteome (removing only redundant peptide sequences which cannot be unambiguously placed on the genome), “Length” refers to a further filter where peptides <5 or >40 amino acids are removed, “proteotypic” refers to peptides that are additionally predicted to be observed by PeptideSieve, and “experimental” refers only to peptides with high quality corresponding experimental data (see Methods). The data in this plot includes peptides up to and including 1 missed cleavage. The total number of peptides in each set is display at the end of the row.
Figure 5
Figure 5. Peptide evidence for alternative protein isoforms encoded by a hypothetical gene
The gene model FAM136A is predicted to encode 2 different isoforms. Ensembl transcripts (top) and enlarged exons (bottom) are shown in grey. Mapped tryptic peptides are shown as black blocks below the expanded exon structure. In total, 6 unique peptides were unambiguously mapped to ENSGALT00000022503 and ENSGALT00000040525. All peptides provide evidence for the translation of unique exons.
Figure 6
Figure 6. Peptide evidence for exon–intron boundaries in the Chicken TPM3 gene
The Tropomyosin alpha-3 chain gene is predicted to encode 4 alternative protein isoforms. Ensembl transcripts (top) and enlarged exons (bottom) are shown in grey. Mapped tryptic peptides are shown as black blocks below the expanded exon structure. In total, 6 unique peptides were unambiguously mapped to ENSGALT00000022043 and ENSGALT00000040695. All peptides cover unique or semi-constitutive exons, providing evidence for differential pre-mRNA processing.
Figure 7
Figure 7. Evidence for an additional isoform in the human NDUFV3 gene
Swiss-Prot contains only a single NADH dehydrogenase [ubiquinone] flavoprotein 3 protein, whereas Ensembl has annotated 3 different isoforms. The multiple sequence alignment shows a large internal segment in ENSP00000342895 that is missing in ENSP000003461969 and in the equivalent Swiss-Prot entry (P56181). A total of 9 peptides (bold regions) were unambiguously mapped to ENSP00000346196, thus confirming an exon-skipping event that has not been annotated in Swiss-Prot. The mitochondrial targeting sequence, common to all predicted isoforms, is shown as a grey box at the beginning of the sequence.

Similar articles

Cited by

References

    1. Castellana NE, Payne SH, Shen ZX, Stanke M, et al. Discovery and revision of Arabidopsis genes by proteogenomics. Proceedings of the National Academy of Sciences of the United States of America. 2008;105:21034–21038. - PMC - PubMed
    1. Desiere F, Deutsch EW, Nesvizhskii AI, Mallick P, et al. Integration with the human genome of peptide sequences obtained by high-throughput mass spectrometry. Genome Biology. 2005;6 - PMC - PubMed
    1. Gupta N, Benhamida J, Bhargava V, Goodman D, et al. Comparative proteogenomics: Combining mass spectrometry and comparative genomics to analyze multiple genomes. Genome Research. 2008;18:1133–1142. - PMC - PubMed
    1. Gupta N, Tanner S, Jaitly N, Adkins JN, et al. Whole proteome analysis of post-translational modifications: Applications of mass-spectrometry for proteogenomic annotation. Genome Research. 2007;17:1362–1377. - PMC - PubMed
    1. Savidor A, Donahoo RS, Hurtado-Gonzales O, VerBerkmoes NC, et al. Expressed peptide tags: An additional layer of data for genome annotation. J. Proteome Res. 2006;5:3048–3058. - PubMed

Publication types