Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jan 15;22(6):qzae083.
doi: 10.1093/gpbjnl/qzae083.

ProtPipe: A Multifunctional Data Analysis Pipeline for Proteomics and Peptidomics

Affiliations

ProtPipe: A Multifunctional Data Analysis Pipeline for Proteomics and Peptidomics

Ziyi Li et al. Genomics Proteomics Bioinformatics. .

Erratum in

Abstract

Mass spectrometry (MS) is a technique widely employed for the identification and characterization of proteins, with personalized medicine, systems biology, and biomedical applications. The application of MS-based proteomics advances our understanding of protein function, cellular signaling, and complex biological systems. MS data analysis is a critical process that includes identifying and quantifying proteins and peptides and then exploring their biological functions in downstream analyses. To address the complexities associated with MS data analysis, we developed ProtPipe to streamline and automate the processing and analysis of high-throughput proteomics and peptidomics datasets with DIA-NN preinstalled. The pipeline facilitates data quality control, sample filtering, and normalization, ensuring robust and reliable downstream analyses. ProtPipe provides downstream analyses, including protein and peptide differential abundance identification, pathway enrichment analysis, protein-protein interaction analysis, and major histocompatibility complex (MHC)-peptide binding affinity analysis. ProtPipe generates annotated tables and visualizations by performing statistical post-processing and calculating fold changes between predefined pairwise conditions in an experimental design. It is an open-source, well-documented tool available at https://github.com/NIH-CARD/ProtPipe, with a user-friendly web interface.

Keywords: Data analysis pipeline; Immunopeptidomics; Mass spectrometry; ProtPipe; Proteomics.

PubMed Disclaimer

Conflict of interest statement

Mike A. Nalls, Cory A. Weller, Nicholas L. Johnson, Syed Shah, and Ziyi Li’s participation in this project was part of a competitive contract awarded to DataTecnica LLC by the National Institutes of Health (NIH) to support open science research. Mike A. Nalls also currently serves on the scientific advisory board for Character Bio Inc., and is a scientific founder at Neuron23 Inc. Björn Oskarsson serves as a consultant for Columbia University/Tsumura Inc., MediciNova, Biogen, uniQure, Amylyx, and Mitsubishi, and has research grants from Columbia University/Tsumura Inc., Biogen, MediciNova, Cytokinetics, Mitsubishi, Calico, Novartis, Sanofi, Ashvattha, and TARGET ALS. Other authors have declared no competing interests.

Figures

Figure 1
Figure 1
The comprehensive workflow of ProtPipe, a multifunctional data analysis pipeline for proteomics and peptidomics The database search is performed using pre-installed DIA-NN for DIA data, offline Spectronaut for DIA data, or offline FragPipe for DDA data and immunopeptidomics data. The resulting .csv files (e.g., protein groups or peptides) can then be analyzed by ProtPipe. ProtPipe generates figures and datasets for quality control, clustering analysis, differential abundance analysis, pathway enrichment analysis, PPI network analysis, and MHC–peptide binding affinity predictions. DIA, Data-Independent Acquisition; PPI, protein–protein interaction; FC, fold change; PC, principal component; MHC, major histocompatibility complex; HLA, human leukocyte antigen.
Figure 2
Figure 2
Analysis of a large-scale proteomics dataset A. Distribution of identified protein groups. B. Distribution of protein intensity. C. Correlation among biological replicates. D. PCA plot. E. Volcano plot showing differential abundance analysis results between two groups. Orange dots represent up-regulated proteins, blue dots indicate down-regulated proteins, and gray dots signify proteins with no significant alterations. F. Set of bar charts present GO terms related to Cellular Component. PCA, Principal Component Analysis; SCC, Spearman correlation coefficient; GO, Gene Ontology.
Figure 3
Figure 3
Heatmap analysis of protein intensity patterns A. Heatmap showing protein intensity patterns across all detected proteins in the samples. B. Customized heatmap generated using a select gene list focusing on marker genes closely associated with neuron cells. C. Customized heatmap generated using a provided gene list highlighting marker genes unique to iPSCs. iPSC, induced pluripotent stem cell.
Figure 4
Figure 4
Comprehensive analysis of UNC13A PPIs A. and B. Bar chart and box plot depicting the number of detected protein groups (A) and the distribution of protein intensity (B) in both the control group and the UNC13A pull-down group. C. Reproducibility assessment by replication correlation. D. Identification of enriched proteins. Volcano plot illustrates proteins significantly enriched in the presence of UNC13A RNA compared to a negative control. E. Venn diagram showing the overlap of the interacting proteins obtained by leveraging the STRING database and the potential interacting proteins obtained from experimental data. F. Rank plot of the proteins based on their FCs in abundance. Yellow dots on the plot represent proteins that are previously known to interact with the target protein, as obtained from the STRING database.
Figure 5
Figure 5
MHC-bound peptide deconvolution A. The count of peptides identified in class I immunopeptidome from various cancer cell lines and tumors. B. The count of peptides with HLA alleles has strong affinities. The cutoff of binding affinities < 200. C. The peptides’ binding affinities for each HLA allele with specific cutoff criteria for MHCflurry affinity (< 200) and percentile (< 2). D. Ranking plot of peptide–HLA binding affinities. The ranking plot visualizes the top 5 peptides exhibiting the strongest binding affinities. E. Peptide binding patterns by HLA alleles. Top 3 peptides with robust binding affinities for each respective HLA allele are shown.
Figure 6
Figure 6
A proteomics database related to CNS A. The overview of the protein group counts of various tissue and cellular sources, including human brain samples, plasma, CSF, muscle cells, and human iPSC-derived neuronal, microglial, and astrocytic cells. B. The UMAP plot. C. Comprehensive heatmap of protein expression patterns across all detected proteins in the dataset. D. GO enrichment analysis of a subset of 356 proteins consistently detectable across all samples. CNS, central nervous system; CSF, cerebrospinal fluid; UMAP, Uniform Manifold Approximation and Projection.

Update of

References

    1. Aebersold R, Mann M. Mass spectrometry-based proteomics. Nature 2003;422:198–207. - PubMed
    1. Wu CC, Yates JR 3rd. The application of mass spectrometry to membrane proteomics. Nat Biotechnol 2003;21:262–7. - PubMed
    1. Wepf A, Glatter T, Schmidt A, Aebersold R, Gstaiger M. Quantitative interaction proteomics using mass spectrometry. Nat Methods 2009;6:203–5. - PubMed
    1. Bantscheff M, Lemeer S, Savitski MM, Kuster B. Quantitative mass spectrometry in proteomics: critical review update from 2007 to the present. Anal Bioanal Chem 2012;404:939–65. - PubMed
    1. Rozanova S, Barkovits K, Nikolov M, Schmidt C, Urlaub H, Marcus K. Quantitative mass spectrometry-based proteomics: an overview. Methods Mol Biol 2021;2228:85–116. - PubMed