Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012;13 Suppl 16(Suppl 16):S5.
doi: 10.1186/1471-2105-13-S16-S5. Epub 2012 Nov 5.

Normalization and missing value imputation for label-free LC-MS analysis

Affiliations

Normalization and missing value imputation for label-free LC-MS analysis

Yuliya V Karpievitch et al. BMC Bioinformatics. 2012.

Abstract

Shotgun proteomic data are affected by a variety of known and unknown systematic biases as well as high proportions of missing values. Typically, normalization is performed in an attempt to remove systematic biases from the data before statistical inference, sometimes followed by missing value imputation to obtain a complete matrix of intensities. Here we discuss several approaches to normalization and dealing with missing values, some initially developed for microarray data and some developed specifically for mass spectrometry-based data.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Examples of missing data. Intensities for a peptide with two treatment groups with (A) no missing values, (B) MCAR missing values, (C) censored missing values, and (D) censored missing values imputed as a minimum observed value.
Figure 2
Figure 2
Percent coverage for nominal 95% confidence intervals of protein-level differences.
Figure 3
Figure 3
Histograms of the null p-values for normalized (left) and raw (right) peptide abundances.
Figure 4
Figure 4
Top three eigentrends identified in raw (left), imputed (middle); and normalized after imputation data (right). X-axis is the sample index, y-axis are values in eigentrends.
Figure 5
Figure 5
Top three eigentrends identified in raw (left), normalized (middle); and imputed after normalization data (right). X-axis is the sample index, y-axis are values in eigentrends.

References

    1. Aebersold R, Mann M. Mass spectrometry-based proteomics. Nature. 2003;422(6928):198–207. doi: 10.1038/nature01511. - DOI - PubMed
    1. Craig R, Beavis RC. TANDEM: matching proteins with tandem mass spectra. Bioinformatics. 2004;20(9):1466–1467. doi: 10.1093/bioinformatics/bth092. - DOI - PubMed
    1. Eng JK, McCormack AL, Yates JR. An approach to correlate MS/MS data to amino acid sequences in a protein database. J Am Soc Mass Spectrom. 1994;5:976–989. doi: 10.1016/1044-0305(94)80016-2. - DOI - PubMed
    1. Pasa-Tolic L, Masselon C, Barry RC, Shen Y, Smith RD. Proteomic analyses using an accurate mass and time tag strategy. Biotechniques. 2004;37(4):621–624. 626-633, 636 passim. - PubMed
    1. Perkins DN, Pappin DJ, Creasy DM, Cottrell JS. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis. 1999;20(18):3551–3567. doi: 10.1002/(SICI)1522-2683(19991201)20:18<3551::AID-ELPS3551>3.0.CO;2-2. - DOI - PubMed

MeSH terms