Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Jan 1;20(1):1-13.
doi: 10.1021/acs.jproteome.0c00123. Epub 2020 Sep 25.

A Review of Imputation Strategies for Isobaric Labeling-Based Shotgun Proteomics

Affiliations
Review

A Review of Imputation Strategies for Isobaric Labeling-Based Shotgun Proteomics

Lisa M Bramer et al. J Proteome Res. .

Abstract

The throughput efficiency and increased depth of coverage provided by isobaric-labeled proteomics measurements have led to increased usage of these techniques. However, the structure of missing data is different than unlabeled studies, which prompts the need for this review to compare the efficacy of nine imputation methods on large isobaric-labeled proteomics data sets to guide researchers on the appropriateness of various imputation methods. Imputation methods were evaluated by accuracy, statistical hypothesis test inference, and run time. In general, expectation maximization and random forest imputation methods yielded the best performance, and constant-based methods consistently performed poorly across all data set sizes and percentages of missing values. For data sets with small sample sizes and higher percentages of missing data, results indicate that statistical inference with no imputation may be preferable. On the basis of the findings in this review, there are core imputation methods that perform better for isobaric-labeled proteomics data, but great care and consideration as to whether imputation is the optimal strategy should be given for data sets comprised of a small number of samples.

Keywords: accuracy; hypothesis testing; imputation; isobaric-labeled proteomics; missing data.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

Figure 1.
Figure 1.
Evaluation of the missing data for the labeled CPTAC data set shows (A) a marginal negative correlation between the mean log2 abundance (before normalization to the reference pool) and the percentage of missing data and (C) that within a 4-plex experiment that the majority of peptides are either all present or all absent for the three nonreference samples. For a similar unlabeled CPTAC data set it is observed that (B) there is a similar negative correlation between log2 abundance (before normalization) and missing data, but for (D) sampling of data in a 4-plex manner yields varying probabilities across the number that will be present or absent.
Figure 2.
Figure 2.
Proportion of missing data across peptides belonging to each log2 intensity bin based on mean log2 intensity, and the resulting discrete likelihood distribution of the probability that a peptide will have missing data based on its measured log2 abundance and median proportion of missing data across all peptides in a respective bin.
Figure 3.
Figure 3.
Heatmaps of the mean RMSE value, across 100 data sets, for each imputation method and varying levels of missing data. Data sets consisting of 2, 3, 10, and 15 iTRAQ plexes (6, 9, 30, and 45 samples) are given in panels (A), (B), (C), and (D), respectively.
Figure 4.
Figure 4.
TPR at 5% FDR for each imputation method by percentage of missing data and for select levels of number of iTRAQ plexes.
Figure 5.
Figure 5.
Cross-validated classification accuracy distributions, over 100 repetitions of 5-fold cross-validation, for each imputation method by data set.
Figure 6.
Figure 6.
Radar plots giving the mean rank of each imputation method, across all levels of the number of plexes and percentage of missing data, for five performance metrics. Values on the outer circle and inner circle correspond to the best and worst performing imputation methods, respectively.

References

    1. Bantscheff M; Lemeer S; Savitski MM; Kuster B Quantitative mass spectrometry in proteomics: critical review update from 2007 to the present. Anal. Bioanal. Chem. 2012, 404 (4), 939–65. - PubMed
    1. Parker CE; Pearson TW; Anderson NL; Borchers CH Mass-spectrometry-based clinical proteomics - a review and prospective. Analyst 2010, 135 (8), 1830–1838. - PMC - PubMed
    1. Zhang AH; Sun H; Yan GL; Han Y; Wang XJ Serum Proteomics in Biomedical Research: A Systematic Review. Appl. Biochem. Biotechnol. 2013, 170 (4), 774–786. - PubMed
    1. Thompson A; Schaefer J; Kuhn K; Kienle S; Schwarz J; Schmidt G; Neumann T; Johnstone RAW; Mohammed AKA; Hamon C Tandem mass tags: A novel quantification strategy for comparative analysis of complex protein mixtures by MS/MS (vol 75, pg 1895, 2003). Anal. Chem. 2006, 78 (12), 4235–4235. - PubMed
    1. Ross PL; Huang YLN; Marchese JN; Williamson B; Parker K; Hattan S; Khainovski N; Pillai S; Dey S; Daniels S; Purkayastha S; Juhasz P; Martin S; Bartlet-Jones M; He F; Jacobson A; Pappin DJ Multiplexed protein quantitation in Saccharomyces cerevisiae using amine-reactive isobaric tagging reagents. Mol. Cell. Proteomics 2004, 3 (12), 1154–1169. - PubMed

Publication types