Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jan 1;19(1):1-11.
doi: 10.1093/bib/bbw095.

A systematic evaluation of normalization methods in quantitative label-free proteomics

A systematic evaluation of normalization methods in quantitative label-free proteomics

Tommi Välikangas et al. Brief Bioinform. .

Abstract

To date, mass spectrometry (MS) data remain inherently biased as a result of reasons ranging from sample handling to differences caused by the instrumentation. Normalization is the process that aims to account for the bias and make samples more comparable. The selection of a proper normalization method is a pivotal task for the reliability of the downstream analysis and results. Many normalization methods commonly used in proteomics have been adapted from the DNA microarray techniques. Previous studies comparing normalization methods in proteomics have focused mainly on intragroup variation. In this study, several popular and widely used normalization methods representing different strategies in normalization are evaluated using three spike-in and one experimental mouse label-free proteomic data sets. The normalization methods are evaluated in terms of their ability to reduce variation between technical replicates, their effect on differential expression analysis and their effect on the estimation of logarithmic fold changes. Additionally, we examined whether normalizing the whole data globally or in segments for the differential expression analysis has an effect on the performance of the normalization methods. We found that variance stabilization normalization (Vsn) reduced variation the most between technical replicates in all examined data sets. Vsn also performed consistently well in the differential expression analysis. Linear regression normalization and local regression normalization performed also systematically well. Finally, we discuss the choice of a normalization method and some qualities of a suitable normalization method in the light of the results of our evaluation.

Keywords: bias; differential expression; intragroup variation; label free; logarithmic fold change; mass spectrometry; normalization; proteomics; quantitation; reproducibility.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The effect of normalization method on intragroup variation between technical replicates. The PMADs of (A) UPS1 data, (B) CPTAC data and (C) SGSD data. The Pearson correlation coefficients of (D) UPS1 data, (E) CPTAC data and (F) SGSD data.
Figure 2
Figure 2
The effect of normalization method on differential expression results. The AUCs of the ROC curves of differential expression analysis in (A) UPS1 data, (B) CPTAC data and (C) SGSD data globally normalized with the different methods. The x axes denote the two-group comparisons of the sample groups.
Figure 3
Figure 3
The logFC of the background proteins and representative examples of the logFC of the spike-in proteins. (A) The density distributions of the logFC of the background proteins over all two-group comparisons in all data sets. The vertical dashed line corresponds to logFC of zero. The logFC of the spike-in proteins (upper boxes) and the background proteins (lower boxes) in the (B) 10 versus 25 fmol comparison of the UPS1 data and (C) the 0.74 versus 2.2 fmol comparison of the CPTAC data. The horizontal solid black lines correspond to logFC of zero, while the horizontal dashed lines correspond to the theoretical expected logFC of the spike-in proteins.
Figure 4
Figure 4
Representative MA plots of the two-group comparisons after normalization with the most successful normalization method and log2 transformation in each data set. MA plots of the (A) 2 versus 10 fmol comparison of the UPS1 data, (B) 0.25 versus 2.2 fmol comparison of the CPTAC data and (C) sample 1 versus sample 4 comparison of the SGSD data normalized with the Vsn normalization. MA plots of the (D) 2 versus 10 fmol comparison of the UPS1 data, (E) 0.25 versus 2.2 fmol comparison of the CPTAC data and (F) sample 1 versus sample 4 comparison of the SGSD data after the log2 transformation. The lighter nonblack points in the plots correspond to the spike-in proteins and the black points to the background proteins. The curve corresponds to a loess smoothing function.
Figure 5
Figure 5
Intragroup variation between biological replicates in the mouse data normalized with the different methods. (A) The PMADs and (B) the Pearson correlation coefficients.

References

    1. Megger DA, Bracht T, Meyer HE, et al.Label-free quantification in clinical proteomics. Biochim Biophys Acta 2013;1834:1581–90. - PubMed
    1. Meissner F, Mann M.. Quantitative shotgun proteomics: considerations for a high-quality workflow in immunology. Nat Immunol 2014;15:112–7. - PubMed
    1. Chawade A, Alexandersson E, Levander F.. Normalyzer: a tool for rapid evaluation of normalization methods for Omics data sets. J Proteome Res 2014;13:3114–20. - PMC - PubMed
    1. Karpievitch YV, Dabney AR, Smith RD.. Normalization and missing value imputation for label-free LC-MS analysis. BMC Bioinformatics 2012;13:S5. - PMC - PubMed
    1. Bolstad BM, Irizarry RA, Åstrand M, et al.A comparison of normalization methods for high density Oligonucleotide array data based on variance and bias. Bioinformatics 2003;19:185–93. - PubMed

Publication types