Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan 10;10(1):18.
doi: 10.1038/s41597-022-01890-6.

InvitroSPI and a large database of proteasome-generated spliced and non-spliced peptides

Affiliations

InvitroSPI and a large database of proteasome-generated spliced and non-spliced peptides

Hanna P Roetschke et al. Sci Data. .

Abstract

Noncanonical epitopes presented by Human Leucocyte Antigen class I (HLA-I) complexes to CD8+ T cells attracted the spotlight in the research of novel immunotherapies against cancer, infection and autoimmunity. Proteasomes, which are the main producers of HLA-I-bound antigenic peptides, can catalyze both peptide hydrolysis and peptide splicing. The prediction of proteasome-generated spliced peptides is an objective that still requires a reliable (and large) database of non-spliced and spliced peptides produced by these proteases. Here, we present an extended database of proteasome-generated spliced and non-spliced peptides, which was obtained by analyzing in vitro digestions of 80 unique synthetic polypeptide substrates, measured by different mass spectrometers. Peptides were identified through invitroSPI method, which was validated through in silico and in vitro strategies. The peptide product database contains 16,631 unique peptide products (5,493 non-spliced, 6,453 cis-spliced and 4,685 trans-spliced peptide products), and a substrate sequence variety that is a valuable source for predictors of proteasome-catalyzed peptide hydrolysis and splicing. Potential artefacts and skewed results due to different identification and analysis strategies are discussed.

PubMed Disclaimer

Conflict of interest statement

The authors have no conflicts of interest.

Figures

Fig. 1
Fig. 1
Proteasome-generated non-spliced and spliced peptides, and overview of method and dataset application. Proteasomes form: (a) non-spliced peptides via peptide hydrolysis, (b-d) spliced peptides through ligation of two non-contiguous splice-reactants either derived from the same protein molecule (cis-spliced peptides, b, c) or from two distinct molecules of the same protein or two distinct proteins (trans-spliced peptides, d). In b-c, peptide fragment ligation can occur in forward order, i.e., following the orientation from N- to C-terminus of the parental protein (forward cis-peptide splicing; b), or in reverse order (reverse cis-peptide splicing; c). The two ligated fragments are named splice-reactants, and their junction is named splice-site. The C-terminus of the first (N-terminal) splice-reactant is named sP1, whilst the N-terminus of the second (C-terminal) splice-reactant is named sP1’. The sequence segment between two splice-reactants is called the intervening sequence. Arrows represent the substrate cleavage sites used by proteasome catalytic Thr1. (e) Overview of methods and datasets described in this study. (f) Substrate synthesis errors. Various forms of synthesis errors could result in alleged non-spliced and/or spliced peptides. Those synthesis errors are captured using control measurements. Furthermore, alleged spliced synthesis errors can be trimmed by the proteasome. All such spliced peptides of which a precursor is identified in control measurements are removed by invitroSPI but not by invitroPB.
Fig. 2
Fig. 2
Difference in the peptide identification strategy and downstream analysis adopted by invitroSPI and invitroPB.
Fig. 3
Fig. 3
Comparison of invitroSPI and invitroPB methods applied to the gp100 Fusion dataset. ae) Number of PSMs assigned to: (a) non-spliced, cis-spliced, trans-spliced peptides, and related synthesis error peptides, (b) PTM-labelled peptides, (c) forward and reverse cis-spliced peptides, (d) spliced peptides with one amino acid long splice-reactant, (e) spliced peptides containing substrate’s N- or C-termini. Assignment was carried out by applying invitroSPI and invitroPB methods to in vitro digestions of TSN2 and TSN89 substrates with proteasomes. PTM-modified non-spliced peptides identified by PEAKS-PTM are reported, although they are not kept in the final list of identified peptides by invitroPB. In invitroSPI identifications, PTM-modified peptides are included. In (b-e), PSMs assigned to synthesis errors have been removed. In (c), forward/reverse cis-spliced peptides, i.e. multi-mapping cis-spliced peptides, are not shown. f,g) MS2 spectra of the cis-spliced epitopes (f) [RTK][QLYPEW] and (g) [QLYPEW][RTK] identified in in vitro digestions of (f) TSN89 and (g) TSN2 substrates, and of their cognate synthetic peptides. Detected m/z and charges in the MS2 spectra shared between in vitro digestion samples and synthetic peptides are indicated in red. Other assigned m/z are indicated in blue. In MS2 spectra, charged b-, a- and y-ions are reported. Double charged ions are marked as ++. Ions’ neutral loss of ammonia is symbolized by *. Extracted ion chromatograms of target peptides in in vitro digestion and synthetic peptides are plotted in the right panels and indicate matching retention times and absence of a biologically meaningful peak in the 0 h digestion. MS ion chromatograms correspond to the m/z = 610.80–610.84 (+2; f) and 407.53–407.57 (+3; g). h) number of unique peptide sequences identified by invitroSPI in the gp100 Fusion dataset shown for 2 h, 4 h and 20 h. i) frequency of spliced and non-spliced peptides over time identified by invitroSPI in the gp100 Fusion dataset comprising two substrates. In (a-e,h-i) in vitro digestion samples (0, 2, 4, 20 h) and cognate synthetic peptides were measured by Orbitrap Fusion Lumos (KCL-CEMS) by using the same MS method. For MS2 spectrum references, (f): file 20210422_WB2_2h_TSN89_FusionCEMS, charge + 2, scan 5897 (upper panel); file 20210422_GP100_mix_FusionCEMS, charge + 2, scan 5208 (lower panel). (g): file 20210422_WA4_20h_TSN2_FusionCEMS, charge + 3, scan 6115 (upper panel); file 20210422_GP100_mix_FusionCEMS, charge + 3, scan 4936 (lower panel).
Fig. 4
Fig. 4
Comparison of invitroSPI and invitroPB methods applied to the PB dataset. (a,b) Number of PSMs assigned to: (a) non-spliced, cis-spliced, trans-spliced peptides, and either related synthesis error peptides, or (b) PTM-labelled peptides. (c) Frequency of PTMs among PTM-labelled non-spliced peptides suggested by PEAKS-PTM as part of invitroPB. (d-f) Number of PSMs assigned to: (d) forward and reverse cis-spliced peptides (multi-mapper forward/reverse cis-spliced peptides are not shown), (e) spliced peptides with one amino acid long splice-reactant, and (f) spliced peptides containing substrate’s N- or C-termini. Assignment was carried out by applying invitroSPI and invitroPB methods to the PB dataset. In invitroSPI-identified peptides, PTM-modified peptides are also included. In (b) and (d-f), PSMs assigned to synthesis errors have been removed. (g) Spectral angle distribution computed between measured and predicted MS2 spectra identified by invitroSPI (red) and invitroPB methods (grey). Only PSMs of unmodified non-spliced and spliced peptide that do not contain any cysteine (C) residues, do not exceed a charge of 6 and are 7–12 amino acid long are here included, since Prosit cannot predict PTM-modified peptide’s MS2 spectra and Prosit performance is influenced by peptide length (Fig. S6). In the violin plots, horizontal black lines represent the median. The number of PSMs for each group is reported. In (a-g), in vitro digestion samples (2 h and 20 h digestions with proteasomes and 20 h without proteasomes) were measured by Orbitrap Fusion Lumos (Oxford proteomics centre).
Fig. 5
Fig. 5
FDR estimation for invitroSPI and invitroPB in PB dataset. (a,b) Spectral angle distribution of non-spliced, cis-spliced and trans-spliced peptide identified by either (a) invitroSPI or (b) invitroPB in the PB dataset. (c) Estimated FDRs based on spectral angle distributions, choosing a spectral angle cut-off of 0.7 (dash line) reported in (a,b).The bars represent the relative frequency of PSMs below the cut-off in each peptide strata. Statistically significant p values < 0.05 (two-samples Wilcoxon test) are reported in (c), and they refer to the comparison of the spectral angle distribution shown in (a,b).
Fig. 6
Fig. 6
Generation efficiency of spliced and non-spliced peptides. Violin plots show the distribution of generation efficiencies for peptide hydrolysis and splicing. Generation efficiencies were calculated as the number of detected over the number of theoretically possible peptides for each substrate. Calculations were carried out on the peptide products and substrate sequences in the whole dataset digested with 20 S standard proteasome (80 substrates). The generation efficiency differs significantly between spliced and non-spliced peptides and, among spliced peptides, between cis- and trans-spliced peptides. Significant p values of a two-samples Wilcoxon test are reported.
Fig. 7
Fig. 7
Features of unique peptides identified in all datasets. (a,b) Frequency (a) and length (b) of unique peptides per substrate. c) Length of N- and C-terminal splice-reactant of cis-spliced peptides that could unequivocally be assigned to a single position within a substrate. In (a–c), analysis has been carried out in the 2/4 h in vitro digestions with 20 S standard proteasomes, derived from the PB dataset (24 substrates) analyzed by invitroSPI and invitroPB, as well as from the Specht dataset (47 substrates) and the whole dataset (71 substrates) analyzed by invitroSPI. Here, PTM-tagged peptides identified by invitroSPI are added to the unmodified peptides. In (a-c), all peptides that could not be unambiguously annotated as either forward or reverse cis-spliced peptides (i.e. the multi-mapper forward/reverse cis-spliced peptides) were removed. Spliced peptides containing a single amino acid residue splice-reactant or the substrate’s N- or C-termini were labelled as such only if that was the only explanation out of all possible peptide origins within the polypeptide substrate. In (c), multi-mapper peptides that could be assigned unambiguously to a spliced peptide type were subsequently checked for the length of their splice-reactants. Among multi-mapper spliced peptides, only those that had a single and unambiguous splice-reactant length are included.
Fig. 8
Fig. 8
Potential pitfalls in data analysis related to peptide product database size. (a) Normalization strategies. Heatmaps display the joint frequency of amino acid combinations at the splice-site (formed by sP1 and sP1’) in the simulated background databases normalized by the amino acid frequency of the investigated substrates. Simulated background databases were computed from the PB dataset (n = 25 substrates) and from the whole dataset (n = 80 substrates). Frequencies were then normalized by the frequency of the amino acids within the substrate sequences. White spots indicate combinations that are impossible to derive from the given set of substrate sequences. Low frequencies are depicted in red, whereas high frequencies are shown in blue. (b) Amino acid frequencies at sP1 and sP1’ sites of forward and reverse cis-spliced peptides in the whole database of unique peptide products identified through invitroSPI, as well as those sequences originally published by Paes et al. The frequency in the true dataset was normalized by the frequency of the respective simulated background database as well as by the sum of all values. To verify the robustness of the frequency estimation, 200 bootstrap iterations were performed, each time sampling 80% of the splice-sites. The 90% confidence intervals of the resulting frequency estimations are displayed. Large confidence intervals indicate low robustness of the frequency estimation.

References

    1. Hanada K, Yewdell JW, Yang JC. Immune recognition of a human renal cancer antigen through post-translational protein splicing. Nature. 2004;427:252–256. doi: 10.1038/nature02240. - DOI - PubMed
    1. Vigneron N, et al. An antigenic peptide produced by peptide splicing in the proteasome. Science. 2004;304:587–590. doi: 10.1126/science.1095522. - DOI - PubMed
    1. Mishto M, Liepe J. Post-Translational Peptide Splicing and T Cell Responses. Trends Immunol. 2017;38:904–915. doi: 10.1016/j.it.2017.07.011. - DOI - PubMed
    1. Berkers CR, et al. Definition of Proteasomal Peptide Splicing Rules for High-Efficiency Spliced Peptide Presentation by MHC Class I Molecules. J Immunol. 2015;195:4085–4095. doi: 10.4049/jimmunol.1402455. - DOI - PMC - PubMed
    1. Mishto M, et al. Driving Forces of Proteasome-catalyzed Peptide Splicing in Yeast and Humans. Mol Cell Proteomics. 2012;11:1008–1023. doi: 10.1074/mcp.M112.020164. - DOI - PMC - PubMed

Substances