Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Apr 7;22(4):1092-1104.
doi: 10.1021/acs.jproteome.2c00441. Epub 2023 Mar 20.

prolfqua: A Comprehensive R-Package for Proteomics Differential Expression Analysis

Affiliations

prolfqua: A Comprehensive R-Package for Proteomics Differential Expression Analysis

Witold E Wolski et al. J Proteome Res. .

Abstract

Mass spectrometry is widely used for quantitative proteomics studies, relative protein quantification, and differential expression analysis of proteins. There is a large variety of quantification software and analysis tools. Nevertheless, there is a need for a modular, easy-to-use application programming interface in R that transparently supports a variety of well principled statistical procedures to make applying them to proteomics data, comparing and understanding their differences easy. The prolfqua package integrates essential steps of the mass spectrometry-based differential expression analysis workflow: quality control, data normalization, protein aggregation, statistical modeling, hypothesis testing, and sample size estimation. The package makes integrating new data formats easy. It can be used to model simple experimental designs with a single explanatory variable and complex experiments with multiple factors and hypothesis testing. The implemented methods allow sensitive and specific differential expression analysis. Furthermore, the package implements benchmark functionality that can help to compare data acquisition, data preprocessing, or data modeling methods using a gold standard data set. The application programmer interface of prolfqua strives to be clear, predictable, discoverable, and consistent to make proteomics data analysis application development easy and exciting. Finally, the prolfqua R-package is available on GitHub https://github.com/fgcz/prolfqua, distributed under the MIT license. It runs on all platforms supported by the R free software environment for statistical computing and graphics.

Keywords: differential expression analysis; proteomics; statistical software.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

Figure 1
Figure 1
Class diagram of classes representing the proteomics data. The LFQData class encapsulates the quantitative proteomics data stored in a table of tidy data. An instance of the AnalysisTableConfiguration class specifies a mapping of table columns to sample names, peptide or protein identifiers, explanatory variables, and response variables. The LFQDataPlotter class and other classes decorate the LFQData class with additional functionality. For instance, the LFQDataStats and LFQDataSummary reference the LFQData class and group methods for variance and sample size estimation or summarizing peptide and protein counts. Furthermore, the LFQDataTransformer and LFQDataAggregator classes group functions for data normalization and estimating protein from peptide intensities.
Figure 2
Figure 2
Unified modeling language (UML) diagram of modeling- and contrast-related classes. Different strategies, e.g., lm, lmer, and glm (Table 2), reference methods to fit models, and compute contrasts. The model builder method fits the statistical model given the data and a strategy. The models are used to analyze variance (ANOVA) or to estimate contrasts. All classes estimating contrasts implement the ContrastsInterface. Results of external tools, e.g., SAINTexpress, or proDA are adapted to implement the Contrasts interface.
Figure 3
Figure 3
(A) Density plot of peptide intensity distributions for 20 samples. For each sample a line with a different color is shown. (B) Peptide intensities for protein HFQ_ECOLI are shown using lines of different colors, and the protein intensity estimate is shown using a fat black line. (C) Distribution of standard deviations of all proteins in each dilution group (a–e) and overall (all). (D) Distribution of protein intensities of Protein HFQ_ECOLI in each dilution group.
Figure 4
Figure 4
(A) Histogram showing the distribution of p-values for 332 proteins for contrasts “e_vs_d” and “d_vs_c”. (B) Volcano plot showing −log10 transformed FDR as a function of the difference between groups for 332 proteins. With black dots, we show effect size and FDR estimates obtained from the linear model, while in green, we plot those obtained using imputation. (C) Difference between groups, as a function of the rank of the abundance of the proteins.
Figure 5
Figure 5
(A) Number of estimated contrasts for each modeling method (higher is better). (B) Partial area under the ROC curve at 10% FPR (pAUC10) for all contrasts and three different statistics: the difference among groups (diff, panel B left), the scaled p-value (sign(diff)·p.value) (scaled.p.value, panel B center), and the t-statistics (statistic, panel B right), where a higher pAUC10 is better. The red line indicates the average area under the curve of all methods. (C) Plot of the false discovery proportion (FDP) as a function of the FDR. Ideally, the FDR should be equal to the FDP. Therefore, larger distances from the diagonal are worse.

References

    1. Vidova V.; Spacil Z. A review on mass spectrometry-based quantitative proteomics: Targeted and data independent acquisition. Analytica chimica acta 2017, 964, 7–23. 10.1016/j.aca.2017.01.059. - DOI - PubMed
    1. Bubis J. A.; Levitsky L. I.; Ivanov M. V.; Tarasova I. A.; Gorshkov M. V. Comparative evaluation of label-free quantification methods for shotgun proteomics. Rapid Commun. Mass Spectrom. 2017, 31, 606–612. 10.1002/rcm.7829. - DOI - PubMed
    1. da Veiga Leprevost F.; Haynes S. E.; Avtonomov D. M.; Chang H.-Y.; Shanmugam A. K.; Mellacheruvu D.; Kong A. T.; Nesvizhskii A. I. Philosopher: a versatile toolkit for shotgun proteomics data analysis. Nat. Methods 2020, 17, 869–870. 10.1038/s41592-020-0912-y. - DOI - PMC - PubMed
    1. Eng J. K.; Hoopmann M. R.; Jahan T. A.; Egertson J. D.; Noble W. S.; MacCoss M. J. A deeper look into Comet—implementation and features. J. Am. Soc. Mass Spectrom. 2015, 26, 1865–1874. 10.1007/s13361-015-1179-x. - DOI - PMC - PubMed
    1. Yu F.; Li N.; Yu W. PIPI: PTM-invariant peptide identification using coding method. J. Proteome Res. 2016, 15, 4423–4435. 10.1021/acs.jproteome.6b00485. - DOI - PubMed