Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 May 25;9(2):26.
doi: 10.3390/proteomes9020026.

A Systematic Evaluation of Semispecific Peptide Search Parameter Enables Identification of Previously Undescribed N-Terminal Peptides and Conserved Proteolytic Processing in Cancer Cell Lines

Affiliations

A Systematic Evaluation of Semispecific Peptide Search Parameter Enables Identification of Previously Undescribed N-Terminal Peptides and Conserved Proteolytic Processing in Cancer Cell Lines

Matthias Fahrner et al. Proteomes. .

Abstract

Liquid chromatography-tandem mass spectrometry (LC-MS/MS) has become the most commonly used technique in explorative proteomic research. A variety of open-source tools for peptide-spectrum matching have become available. Most analyses of explorative MS data are performed using conventional settings, such as fully specific enzymatic constraints. Here we evaluated the impact of the fragment mass tolerance in combination with the enzymatic constraints on the performance of three search engines. Three open-source search engines (Myrimatch, X! Tandem, and MSGF+) were evaluated concerning the suitability in semi- and unspecific searches as well as the importance of accurate fragment mass spectra in non-specific peptide searches. We then performed a semispecific reanalysis of the published NCI-60 deep proteome data applying the most suited parameters. Semi- and unspecific LC-MS/MS data analyses particularly benefit from accurate fragment mass spectra while this effect is less pronounced for conventional, fully specific peptide-spectrum matching. Search speed differed notably between the three search engines for semi- and non-specific peptide-spectrum matching. Semispecific reanalysis of NCI-60 proteome data revealed hundreds of previously undescribed N-terminal peptides, including cases of proteolytic processing or likely alternative translation start sites, some of which were ubiquitously present in all cell lines of the reanalyzed panel. Highly accurate MS2 fragment data in combination with modern open-source search algorithms enable the confident identification of semispecific peptides from large proteomic datasets. The identification of previously undescribed N-terminal peptides in published studies highlights the potential of future reanalysis and data mining in proteomic datasets.

Keywords: NCI-60 reanalysis; endogenous proteolysis; fragment mass tolerance; mass spectrometry; semispecific peptide search.

PubMed Disclaimer

Conflict of interest statement

The authors have declared no conflict of interest.

Figures

Figure 1
Figure 1
Effect of less stringent enzymatic constraints and fragment mass tolerances on peptide identification results. Human formalin-fixed, paraffin-embedded (FFPE) samples were digested using LysC and Trypsin and were analyzed using Myrimatch (upper panel) and X! Tandem (lower panel). The number of identified unique non-redundant peptide identifications of the three replicates are shown in a violin plot according to the enzymatic constraint and the fragment mass tolerance (10, 100, and 1000 ppm) of the search engine settings.
Figure 2
Figure 2
Peptide search results from three different open-source search engines. Four biological replicates of Human Embryonic Kidney (HEK293T) cell proteome (A) and three adjacent formalin-fixed, paraffin-embedded (FFPE) tissue slides of Murine kidney (B) samples were digested using either LysC (A) or chymotrypsin (B) and were analyzed using MSGF+ (left), Myrimatch (middle), and X! Tandem (right). The number of identified unique non-redundant peptide hits (upper panel), the elapsed analysis time in min (middle panel) as well as the number of identified unique peptides per time (lower panel) is illustrated according to the enzymatic constraint settings.
Figure 3
Figure 3
Large-scale semispecific reanalysis of published NCI-60 deep proteome dataset. Workflow used for the analysis of published NCI-60 deep proteome data using OpenMS tools in a workflow within the Galaxy framework. Whole proteome samples of nine representative cancer cell lines were separated into 24 samples using gel-based molecular weight separation. Peptide identification was performed using MSGF+ with semitryptic enzymatic constraint, followed by false discovery rate (FDR) computation and filtering for 1% FDR on the peptide-spectrum matching (PSM) level.
Figure 4
Figure 4
Identification of semispecific N-terminal peptides and proteins with prominent endogenous proteolytic processing. (A) Bar chart showing the number of confidently identified semispecific N-terminal peptides. Primary identification results from semispecific peptide searches were filtered for unique peptides, which were identified in at least two out of nine cell lines. Peptides originating from protein N-terminal methionine clipping or representing the native C-terminus were excluded. Only semispecific N-terminal peptides derived from proteins that were proximal to the expected molecular weight gel slice were considered (Supplementary Figure S2). (B) Heatmap showing proteins for which at least 10 peptides were identified in at least one of the nine cell lines. The color indicates the number of semispecific peptides identified per protein in the respective cell lines.

References

    1. Aebersold R., Mann M. Mass-spectrometric exploration of proteome structure and function. Nat. Cell Biol. 2016;537:347–355. doi: 10.1038/nature19949. - DOI - PubMed
    1. Föll M.C., Fahrner M., Gretzmeier C., Thoma K., Biniossek M.L., Kiritsi D., Meiss F., Schilling O., Nyström A., Kern J.S. Identification of tissue damage, extracellular matrix remodeling and bacterial challenge as common mechanisms associated with high-risk cutaneous squamous cell carcinomas. Matrix Biol. 2018;66:1–21. doi: 10.1016/j.matbio.2017.11.004. - DOI - PubMed
    1. Oria V., Bronsert P., Thomsen A., Föll M., Zamboglou C., Hannibal L., Behringer S., Biniossek M., Schreiber C., Grosu A., et al. Proteome Profiling of Primary Pancreatic Ductal Adenocarcinomas Undergoing Additive Chemoradiation Link ALDH1A1 to Early Local Recurrence and Chemoradiation Resistance. Transl. Oncol. 2018;11:1307–1322. doi: 10.1016/j.tranon.2018.08.001. - DOI - PMC - PubMed
    1. Müller A.-K., Föll M., Heckelmann B., Kiefer S., Werner M., Schilling O., Biniossek M.L., Jilg C.A., Drendel V. Proteomic Characterization of Prostate Cancer to Distinguish Nonmetastasizing and Metastasizing Primary Tumors and Lymph Node Metastases. Neoplasia. 2018;20:140–151. doi: 10.1016/j.neo.2017.10.009. - DOI - PMC - PubMed
    1. Brosch M., Swamy S., Hubbard T., Choudhary J. Comparison of Mascot and X!Tandem Performance for Low and High Accuracy Mass Spectrometry and the Development of an Adjusted Mascot Threshold. Mol. Cell. Proteom. 2008;7:962–970. doi: 10.1074/mcp.M700293-MCP200. - DOI - PMC - PubMed

LinkOut - more resources