Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Mar 1;18(3):1418-1425.
doi: 10.1021/acs.jproteome.8b00760. Epub 2019 Jan 28.

pmartR: Quality Control and Statistics for Mass Spectrometry-Based Biological Data

Affiliations

pmartR: Quality Control and Statistics for Mass Spectrometry-Based Biological Data

Kelly G Stratton et al. J Proteome Res. .

Abstract

Prior to statistical analysis of mass spectrometry (MS) data, quality control (QC) of the identified biomolecule peak intensities is imperative for reducing process-based sources of variation and extreme biological outliers. Without this step, statistical results can be biased. Additionally, liquid chromatography-MS proteomics data present inherent challenges due to large amounts of missing data that require special consideration during statistical analysis. While a number of R packages exist to address these challenges individually, there is no single R package that addresses all of them. We present pmartR, an open-source R package, for QC (filtering and normalization), exploratory data analysis (EDA), visualization, and statistical analysis robust to missing data. Example analysis using proteomics data from a mouse study comparing smoke exposure to control demonstrates the core functionality of the package and highlights the capabilities for handling missing data. In particular, using a combined quantitative and qualitative statistical test, 19 proteins whose statistical significance would have been missed by a quantitative test alone were identified. The pmartR package provides a single software tool for QC, EDA, and statistical comparisons of MS data that is robust to missing data and includes numerous visualization capabilities.

Keywords: R package; mass spectrometry; normalization; quality control; quantification; statistics; visualization.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

Figure 1
Figure 1
(A) Boxplots for each log2 sample prior to QC. (B) The PPCA plot of the log2 peptide data without imputation of missing values shows minimal clustering of the treatment groups. (C) A bar graph of the number of missing values per sample does not reveal anything systematically different between the two groups or the individual samples. (D) The Pearson correlation heatmap among the log2-transformed samples shows some variation in correlation across the samples but nothing to indicate a potential outlying sample. (E) The rMd plot does not identify any potential sample outliers. (F) The values for each of the metrics included in the rMd calculation are indicated on boxplots for sample U54_SMOKEmp_119, which is the control sample having the highest log2 rMd score.
Figure 2
Figure 2
(A) Heatmap of the SPANS scores for each combination of data subset and normalization method that shows the use of the MAD of the LOS 0.05 peptides to be the top choice for the normalization approach, followed by the MAD of the LOS 0.2 peptides. (B) Boxplots for each sample that show the distributions of normalized log2 peptide abundance, where normalization was performed using the MAD of the LOS 0.2 peptides.
Figure 3
Figure 3
Graphical summary of the statistical results that includes (A) a bar graph of the number of significant proteins, both in total and broken out by statistical test, and (B) a volcano plot for the t-test (ANOVA) results (top) and a plot of the number of observations per group for the g-test results (bottom).

References

    1. Webb-Robertson B. J.; Wiberg H. K.; Matzke M. M.; Brown J. N.; Wang J.; McDermott J. E.; Smith R. D.; Rodland K. D.; Metz T. O.; Pounds J. G.; Waters K. M. Review, evaluation, and discussion of the challenges of missing value imputation for mass spectrometry-based label-free global proteomics. J. Proteome Res. 2015, 14 (5), 1993–2001. 10.1021/pr501138h. - DOI - PMC - PubMed
    1. Karpievitch Y. V.; Dabney A. R.; Smith R. D. Normalization and missing value imputation for label-free LC-MS analysis. BMC Bioinf. 2012, 13 (Suppl. 16), S5.10.1186/1471-2105-13-S16-S5. - DOI - PMC - PubMed
    1. Wang J.; Li L.; Chen T.; Ma J.; Zhu Y.; Zhuang J.; Chang C. In-depth method assessments of differentially expressed protein detection for shotgun proteomics data with missing values. Sci. Rep. 2017, 7 (1), 3367.10.1038/s41598-017-03650-8. - DOI - PMC - PubMed
    1. Lazar C.; Gatto L.; Ferro M.; Bruley C.; Burger T. Accounting for the Multiple Natures of Missing Values in Label-Free Quantitative Proteomics Data Sets to Compare Imputation Strategies. J. Proteome Res. 2016, 15 (4), 1116–25. 10.1021/acs.jproteome.5b00981. - DOI - PubMed
    1. Huber W.; Carey V. J.; Gentleman R.; Anders S.; Carlson M.; Carvalho B. S.; Bravo H. C.; Davis S.; Gatto L.; Girke T.; Gottardo R.; Hahne F.; Hansen K. D.; Irizarry R. A.; Lawrence M.; Love M. I.; MacDonald J.; Obenchain V.; Oles A. K.; Pages H.; Reyes A.; Shannon P.; Smyth G. K.; Tenenbaum D.; Waldron L.; Morgan M. Orchestrating high-throughput genomic analysis with Bioconductor. Nat. Methods 2015, 12 (2), 115–21. 10.1038/nmeth.3252. - DOI - PMC - PubMed

Publication types

MeSH terms