Review

. 2021 Jan 1;20(1):1-13.

doi: 10.1021/acs.jproteome.0c00123. Epub 2020 Sep 25.

A Review of Imputation Strategies for Isobaric Labeling-Based Shotgun Proteomics

Lisa M Bramer¹, Jan Irvahn², Paul D Piehowski³, Karin D Rodland³, Bobbie-Jo M Webb-Robertson³

Affiliations

¹ Computing & Analytics Division, Pacific Northwest National Laboratory, Richland, Washington 99354, United States.
² Boeing, Seattle, Washington 98055, United States.
³ Biological Sciences Division, Pacific Northwest National Laboratory, 902 Battelle Blvd., Richland, Washington 99354, United States.

PMID: 32929967
PMCID: PMC8996546
DOI: 10.1021/acs.jproteome.0c00123

Review

A Review of Imputation Strategies for Isobaric Labeling-Based Shotgun Proteomics

Lisa M Bramer et al. J Proteome Res. 2021.

. 2021 Jan 1;20(1):1-13.

doi: 10.1021/acs.jproteome.0c00123. Epub 2020 Sep 25.

Authors

Lisa M Bramer¹, Jan Irvahn², Paul D Piehowski³, Karin D Rodland³, Bobbie-Jo M Webb-Robertson³

Affiliations

¹ Computing & Analytics Division, Pacific Northwest National Laboratory, Richland, Washington 99354, United States.
² Boeing, Seattle, Washington 98055, United States.
³ Biological Sciences Division, Pacific Northwest National Laboratory, 902 Battelle Blvd., Richland, Washington 99354, United States.

PMID: 32929967
PMCID: PMC8996546
DOI: 10.1021/acs.jproteome.0c00123

Abstract

The throughput efficiency and increased depth of coverage provided by isobaric-labeled proteomics measurements have led to increased usage of these techniques. However, the structure of missing data is different than unlabeled studies, which prompts the need for this review to compare the efficacy of nine imputation methods on large isobaric-labeled proteomics data sets to guide researchers on the appropriateness of various imputation methods. Imputation methods were evaluated by accuracy, statistical hypothesis test inference, and run time. In general, expectation maximization and random forest imputation methods yielded the best performance, and constant-based methods consistently performed poorly across all data set sizes and percentages of missing values. For data sets with small sample sizes and higher percentages of missing data, results indicate that statistical inference with no imputation may be preferable. On the basis of the findings in this review, there are core imputation methods that perform better for isobaric-labeled proteomics data, but great care and consideration as to whether imputation is the optimal strategy should be given for data sets comprised of a small number of samples.

Keywords: accuracy; hypothesis testing; imputation; isobaric-labeled proteomics; missing data.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

**Figure 1.**
Evaluation of the missing data for the labeled CPTAC data set shows (A) a marginal negative correlation between the mean log₂ abundance (before normalization to the reference pool) and the percentage of missing data and (C) that within a 4-plex experiment that the majority of peptides are either all present or all absent for the three nonreference samples. For a similar unlabeled CPTAC data set it is observed that (B) there is a similar negative correlation between log₂ abundance (before normalization) and missing data, but for (D) sampling of data in a 4-plex manner yields varying probabilities across the number that will be present or absent.

**Figure 2.**
Proportion of missing data across peptides belonging to each log₂ intensity bin based on mean log₂ intensity, and the resulting discrete likelihood distribution of the probability that a peptide will have missing data based on its measured log₂ abundance and median proportion of missing data across all peptides in a respective bin.

**Figure 3.**
Heatmaps of the mean RMSE value, across 100 data sets, for each imputation method and varying levels of missing data. Data sets consisting of 2, 3, 10, and 15 iTRAQ plexes (6, 9, 30, and 45 samples) are given in panels (A), (B), (C), and (D), respectively.

**Figure 4.**
TPR at 5% FDR for each imputation method by percentage of missing data and for select levels of number of iTRAQ plexes.

**Figure 5.**
Cross-validated classification accuracy distributions, over 100 repetitions of 5-fold cross-validation, for each imputation method by data set.

**Figure 6.**
Radar plots giving the mean rank of each imputation method, across all levels of the number of plexes and percentage of missing data, for five performance metrics. Values on the outer circle and inner circle correspond to the best and worst performing imputation methods, respectively.

See this image and copyright information in PMC

References

1. Bantscheff M; Lemeer S; Savitski MM; Kuster B Quantitative mass spectrometry in proteomics: critical review update from 2007 to the present. Anal. Bioanal. Chem. 2012, 404 (4), 939–65. - PubMed
1. Parker CE; Pearson TW; Anderson NL; Borchers CH Mass-spectrometry-based clinical proteomics - a review and prospective. Analyst 2010, 135 (8), 1830–1838. - PMC - PubMed
1. Zhang AH; Sun H; Yan GL; Han Y; Wang XJ Serum Proteomics in Biomedical Research: A Systematic Review. Appl. Biochem. Biotechnol. 2013, 170 (4), 774–786. - PubMed
1. Thompson A; Schaefer J; Kuhn K; Kienle S; Schwarz J; Schmidt G; Neumann T; Johnstone RAW; Mohammed AKA; Hamon C Tandem mass tags: A novel quantification strategy for comparative analysis of complex protein mixtures by MS/MS (vol 75, pg 1895, 2003). Anal. Chem. 2006, 78 (12), 4235–4235. - PubMed
1. Ross PL; Huang YLN; Marchese JN; Williamson B; Parker K; Hattan S; Khainovski N; Pillai S; Dey S; Daniels S; Purkayastha S; Juhasz P; Martin S; Bartlet-Jones M; He F; Jacobson A; Pappin DJ Multiplexed protein quantitation in Saccharomyces cerevisiae using amine-reactive isobaric tagging reagents. Mol. Cell. Proteomics 2004, 3 (12), 1154–1169. - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A Review of Imputation Strategies for Isobaric Labeling-Based Shotgun Proteomics

Affiliations

A Review of Imputation Strategies for Isobaric Labeling-Based Shotgun Proteomics

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources