. 2025 Jan 15;22(6):qzae083.

doi: 10.1093/gpbjnl/qzae083.

ProtPipe: A Multifunctional Data Analysis Pipeline for Proteomics and Peptidomics

Ziyi Li^{1

2}, Cory A Weller^{1

2}, Syed Shah^{1

2}, Nicholas L Johnson^{1

2}, Ying Hao¹, Paige B Jarreau¹, Jessica Roberts¹, Deyaan Guha¹, Colleen Bereda¹, Sydney Klaisner¹, Pedro Machado³, Matteo Zanovello³, Mercedes Prudencio^{4

5}, Björn Oskarsson^{4

5}, Nathan P Staff⁶, Dennis W Dickson⁴, Pietro Fratta³, Leonard Petrucelli^{4

5}, Priyanka Narayan⁷, Mark R Cookson^{1

8}, Michael E Ward^{1

9}, Andrew B Singleton^{1

8}, Mike A Nalls^{1

2}, Yue A Qi¹

Affiliations

¹ Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20892, USA.
² DataTecnica LLC, Washington, DC 20812, USA.
³ UCL Queen Square Motor Neuron Disease Centre, Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, University College London, London, WC1N 3BG, UK.
⁴ Department of Neuroscience, Mayo Clinic, Jacksonville, FL 32224, USA.
⁵ Neuroscience Graduate Program, Mayo Clinic Graduate School of Biomedical Sciences, Jacksonville, FL 32224, USA.
⁶ Department of Neurology, Mayo Clinic, Rochester, MN 55905, USA.
⁷ Genetics and Biochemistry Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD 20892, USA.
⁸ Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20892, USA.
⁹ National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20892, USA.

PMID: 39576693
PMCID: PMC11842048
DOI: 10.1093/gpbjnl/qzae083

ProtPipe: A Multifunctional Data Analysis Pipeline for Proteomics and Peptidomics

Ziyi Li et al. Genomics Proteomics Bioinformatics. 2025.

. 2025 Jan 15;22(6):qzae083.

doi: 10.1093/gpbjnl/qzae083.

Authors

Affiliations

¹ Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20892, USA.
² DataTecnica LLC, Washington, DC 20812, USA.
³ UCL Queen Square Motor Neuron Disease Centre, Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, University College London, London, WC1N 3BG, UK.
⁴ Department of Neuroscience, Mayo Clinic, Jacksonville, FL 32224, USA.
⁵ Neuroscience Graduate Program, Mayo Clinic Graduate School of Biomedical Sciences, Jacksonville, FL 32224, USA.
⁶ Department of Neurology, Mayo Clinic, Rochester, MN 55905, USA.
⁷ Genetics and Biochemistry Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD 20892, USA.
⁸ Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20892, USA.
⁹ National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20892, USA.

PMID: 39576693
PMCID: PMC11842048
DOI: 10.1093/gpbjnl/qzae083

Erratum in

Correction to: ProtPipe: A Multifunctional Data Analysis Pipeline for Proteomics and Peptidomics.
[No authors listed] [No authors listed] Genomics Proteomics Bioinformatics. 2025 May 10;23(1):qzaf051. doi: 10.1093/gpbjnl/qzaf051. Genomics Proteomics Bioinformatics. 2025. PMID: 40582372 Free PMC article. No abstract available.

Abstract

Mass spectrometry (MS) is a technique widely employed for the identification and characterization of proteins, with personalized medicine, systems biology, and biomedical applications. The application of MS-based proteomics advances our understanding of protein function, cellular signaling, and complex biological systems. MS data analysis is a critical process that includes identifying and quantifying proteins and peptides and then exploring their biological functions in downstream analyses. To address the complexities associated with MS data analysis, we developed ProtPipe to streamline and automate the processing and analysis of high-throughput proteomics and peptidomics datasets with DIA-NN preinstalled. The pipeline facilitates data quality control, sample filtering, and normalization, ensuring robust and reliable downstream analyses. ProtPipe provides downstream analyses, including protein and peptide differential abundance identification, pathway enrichment analysis, protein-protein interaction analysis, and major histocompatibility complex (MHC)-peptide binding affinity analysis. ProtPipe generates annotated tables and visualizations by performing statistical post-processing and calculating fold changes between predefined pairwise conditions in an experimental design. It is an open-source, well-documented tool available at https://github.com/NIH-CARD/ProtPipe, with a user-friendly web interface.

Keywords: Data analysis pipeline; Immunopeptidomics; Mass spectrometry; ProtPipe; Proteomics.

Published by Oxford University Press and Science Press on behalf of the Beijing Institute of Genomics, Chinese Academy of Sciences / China National Center for Bioinformation and Genetics Society of China 2024.

PubMed Disclaimer

Conflict of interest statement

Mike A. Nalls, Cory A. Weller, Nicholas L. Johnson, Syed Shah, and Ziyi Li’s participation in this project was part of a competitive contract awarded to DataTecnica LLC by the National Institutes of Health (NIH) to support open science research. Mike A. Nalls also currently serves on the scientific advisory board for Character Bio Inc., and is a scientific founder at Neuron23 Inc. Björn Oskarsson serves as a consultant for Columbia University/Tsumura Inc., MediciNova, Biogen, uniQure, Amylyx, and Mitsubishi, and has research grants from Columbia University/Tsumura Inc., Biogen, MediciNova, Cytokinetics, Mitsubishi, Calico, Novartis, Sanofi, Ashvattha, and TARGET ALS. Other authors have declared no competing interests.

Figures

**Figure 1**
The comprehensive workflow of ProtPipe, a multifunctional data analysis pipeline for proteomics and peptidomics The database search is performed using pre-installed DIA-NN for DIA data, offline Spectronaut for DIA data, or offline FragPipe for DDA data and immunopeptidomics data. The resulting .csv files (*e.g*., protein groups or peptides) can then be analyzed by ProtPipe. ProtPipe generates figures and datasets for quality control, clustering analysis, differential abundance analysis, pathway enrichment analysis, PPI network analysis, and MHC–peptide binding affinity predictions. DIA, Data-Independent Acquisition; PPI, protein–protein interaction; FC, fold change; PC, principal component; MHC, major histocompatibility complex; HLA, human leukocyte antigen.

**Figure 2**
Analysis of a large-scale proteomics dataset A. Distribution of identified protein groups. B. Distribution of protein intensity. C. Correlation among biological replicates. D. PCA plot. E. Volcano plot showing differential abundance analysis results between two groups. Orange dots represent up-regulated proteins, blue dots indicate down-regulated proteins, and gray dots signify proteins with no significant alterations. F. Set of bar charts present GO terms related to Cellular Component. PCA, Principal Component Analysis; SCC, Spearman correlation coefficient; GO, Gene Ontology.

**Figure 3**
Heatmap analysis of protein intensity patterns A. Heatmap showing protein intensity patterns across all detected proteins in the samples. B. Customized heatmap generated using a select gene list focusing on marker genes closely associated with neuron cells. C. Customized heatmap generated using a provided gene list highlighting marker genes unique to iPSCs. iPSC, induced pluripotent stem cell.

**Figure 4**
Comprehensive analysis of UNC13A PPIs A. and B. Bar chart and box plot depicting the number of detected protein groups (A) and the distribution of protein intensity (B) in both the control group and the UNC13A pull-down group. C. Reproducibility assessment by replication correlation. D. Identification of enriched proteins. Volcano plot illustrates proteins significantly enriched in the presence of UNC13A RNA compared to a negative control. E. Venn diagram showing the overlap of the interacting proteins obtained by leveraging the STRING database and the potential interacting proteins obtained from experimental data. F. Rank plot of the proteins based on their FCs in abundance. Yellow dots on the plot represent proteins that are previously known to interact with the target protein, as obtained from the STRING database.

**Figure 5**
MHC-bound peptide deconvolution A. The count of peptides identified in class I immunopeptidome from various cancer cell lines and tumors. B. The count of peptides with HLA alleles has strong affinities. The cutoff of binding affinities < 200. C. The peptides’ binding affinities for each HLA allele with specific cutoff criteria for MHCflurry affinity (< 200) and percentile (< 2). D. Ranking plot of peptide–HLA binding affinities. The ranking plot visualizes the top 5 peptides exhibiting the strongest binding affinities. E. Peptide binding patterns by HLA alleles. Top 3 peptides with robust binding affinities for each respective HLA allele are shown.

**Figure 6**
A proteomics database related to CNS A. The overview of the protein group counts of various tissue and cellular sources, including human brain samples, plasma, CSF, muscle cells, and human iPSC-derived neuronal, microglial, and astrocytic cells. B. The UMAP plot. C. Comprehensive heatmap of protein expression patterns across all detected proteins in the dataset. D. GO enrichment analysis of a subset of 356 proteins consistently detectable across all samples. CNS, central nervous system; CSF, cerebrospinal fluid; UMAP, Uniform Manifold Approximation and Projection.

See this image and copyright information in PMC

Update of

ProtPipe: A Multifunctional Data Analysis Pipeline for Proteomics and Peptidomics.
Li Z, Weller CA, Shah S, Johnson N, Hao Y, Roberts J, Bereda C, Klaisner S, Machado P, Fratta P, Petrucelli L, Prudencio M, Oskarsson B, Staff NP, Dickson DW, Cookson MR, Ward ME, Singleton AB, Nalls MA, Qi YA. Li Z, et al. bioRxiv [Preprint]. 2023 Dec 13:2023.12.12.571327. doi: 10.1101/2023.12.12.571327. bioRxiv. 2023. Update in: Genomics Proteomics Bioinformatics. 2025 Jan 15;22(6):qzae083. doi: 10.1093/gpbjnl/qzae083. PMID: 38168437 Free PMC article. Updated. Preprint.

References

1. Aebersold R, Mann M. Mass spectrometry-based proteomics. Nature 2003;422:198–207. - PubMed
1. Wu CC, Yates JR 3rd. The application of mass spectrometry to membrane proteomics. Nat Biotechnol 2003;21:262–7. - PubMed
1. Wepf A, Glatter T, Schmidt A, Aebersold R, Gstaiger M. Quantitative interaction proteomics using mass spectrometry. Nat Methods 2009;6:203–5. - PubMed
1. Bantscheff M, Lemeer S, Savitski MM, Kuster B. Quantitative mass spectrometry in proteomics: critical review update from 2007 to the present. Anal Bioanal Chem 2012;404:939–65. - PubMed
1. Rozanova S, Barkovits K, Nikolov M, Schmidt C, Urlaub H, Marcus K. Quantitative mass spectrometry-based proteomics: an overview. Methods Mol Biol 2021;2228:85–116. - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

ProtPipe: A Multifunctional Data Analysis Pipeline for Proteomics and Peptidomics

Affiliations

ProtPipe: A Multifunctional Data Analysis Pipeline for Proteomics and Peptidomics

Authors

Affiliations

Erratum in

Abstract

Conflict of interest statement

Figures

Update of

References

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials