Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May 22;16(1):4747.
doi: 10.1038/s41467-025-59788-x.

Imputing single-cell protein abundance in multiplex tissue imaging

Affiliations

Imputing single-cell protein abundance in multiplex tissue imaging

Raphael Kirchgaessner et al. Nat Commun. .

Abstract

Multiplex tissue imaging enables single-cell spatial proteomics and transcriptomics but remains limited by incomplete molecular profiling, tissue loss, and probe failure. Here, we apply machine learning to impute single-cell protein abundance using multiplex tissue imaging data from a breast cancer cohort. We evaluate regularized linear regression, gradient-boosted trees, and deep learning autoencoders, incorporating spatial context to enhance imputation accuracy. Our models achieve mean absolute errors between 0.05-0.3 on a [0,1] scale, closely approximating ground truth values. Using imputed data, we classify single cells as pre- or post-treatment, demonstrating their biological relevance. These findings establish the feasibility of imputing missing protein abundance, highlight the advantages of spatial information, and support machine learning as a powerful tool for improving single-cell tissue imaging.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Overview of dataset, study motivations, and analysis approaches.
a Biopsies were obtained from four HR+ breast cancer patients before and after the same therapy for a total of eight biopsies. b Each biopsy was assayed using the multiplexed tissue imaging assay t-CyCIF to quantify abundance levels of 20 proteins and then processed using an image analysis pipeline to create single-cell feature tables (total number of cells identified: 475359); c The key tasks addressed by this work are imputing failed proteins and inferring additional proteins not present in an multiplex tissue imaging (MTI) experiment; d Approaches for training and testing ML models for imputing proteins across patients.
Fig. 2
Fig. 2. Imputation results for null model and elastic net and Light GBM machine learning models across patients.
a Imputation results for all proteins demonstrate improved mean absolute error (MAE) by using Elastic Net (EN) compared to a null model. b Imputation results using EN & Light GBM (LGBM) show low MAE for imputation for 12 out of 16 available proteins. c, d Visualization of in situ protein expression, ground-truth single-cell abundance from the image processing pipeline, and imputed single-cell abundance for proteins Vimentin and PR. Results were created using n = 475359 single cells. We used 30 replicates with different train & test splits to validate performance metrics. Supplementary Table 3 and Supplementary Table 4 provide all boxen plots description for this figure. p-values were calculated using a two-sided Mann-Whitney test and the Benjamini-Hochberg procedure for multiple testing comparisons. Each boxenplot displays nested boxes corresponding to progressively smaller quantile ranges. The central, widest box represents the interquartile range (25th–75th percentiles), capturing the middle 50% of the data. Narrower boxes above and below reflect increasingly extreme quantiles (e.g., 12.5th–87.5th, 6.25th–93.75th), providing a detailed view of distribution tails. Outliers beyond the outermost quantile range are shown as diamonds. p-values: ns: not significant p ≤ 1.00e + 00; *: 1.00e-02 <p ≤ 5.00e-02; **: 1.00e-03 <p ≤ 1.00e-02; ***: 1.00e-04 <p ≤ 1.00e-03; ****:p ≤ 1.00e-04.
Fig. 3
Fig. 3. Cluster metrics and Phenotype calling results between original and imputed values.
a Adjusted Rand Index (ARI) scores demonstrating high similarity between ground truth and imputed data clustering. b Silhouette scores for single-cell protein abundance clustering improve when using imputed values, indicating improved clustering when using imputed data. c Adjusted Rand Index (ARI) Scores for cell phenotype matching using ground truth and imputed data showing moderate to strong overlap. d Jaccard scores show moderate to strong overlap between phenotypes using ground truth and imputed protein expression data. Results were created using n = 475359 single cells. We used 30 replicates with different train & test splits to validate performance metrics. Supplementary Table 5, Supplementary Table 6, Supplementary Table 7 and Supplementary Table 8 provide detailed descriptions for all boxenplots. p-values were calculated using a two-sided Mann-Whitney test and the Benjamini-Hochberg procedure for multiple testing comparisons. Each boxenplot displays nested boxes corresponding to progressively smaller quantile ranges. The central, widest box represents the interquartile range (25th–75th percentiles), capturing the middle 50% of the data. Narrower boxes above and below reflect increasingly extreme quantiles (e.g., 12.5th–87.5th, 6.25th–93.75th), providing a detailed view of distribution tails. Outliers beyond the outermost quantile range are shown as diamonds. p-values: ns: not significant, p ≤ 1.00e + 00; *: 1.00e-02 <p ≤ 5.00e-02; **: 1.00e-03 <p ≤ 1.00e-02; ***: 1.00e-04 <p ≤ 1.00e-03; ****: p ≤ 1.00e-04.
Fig. 4
Fig. 4. Autoencoder imputation results and performance comparison between machine learning models.
a The autoencoder (AE) is trained and then uses an iterative approach to impute single or multiple proteins. To start, proteins to be imputed are replaced with either zero or the mean of the intensity values in the training set. Then, the autoencoder is used iteratively to predict protein intensities using output values as new input values for each iteration. b AE single- and multi-protein imputation performance. c performance comparison between all evaluated machine learning (ML) models shows similar performance overall and that Light Gradient Boosting Machine (LGBM) performs best, followed by Elastic Net (EN) and finally AE. There is no significant difference between single and multi-protein imputation performance for AE. Results were created using n = 475359 single cells. We used 30 replicates with different train & test splits to validate performance metrics. Supplementary Table 9 and Supplementary Table 10 provide detailed description for all boxenplots. p-values were calculated using a two-sided Mann-Whitney test and the Benjamini-Hochberg procedure for multiple testing comparisons. Each boxenplot displays nested boxes corresponding to progressively smaller quantile ranges. The central, widest box represents the interquartile range (25th–75th percentiles), capturing the middle 50% of the data. Narrower boxes above and below reflect increasingly extreme quantiles (e.g., 12.5th–87.5th, 6.25th–93.75th), providing a detailed view of distribution tails. Outliers beyond the outermost quantile range are shown as diamonds. p-values: ns: not significant p ≤ 1.00e + 00; *: 1.00e-02 <p ≤ 5.00e-02; **: 1.00e-03 <p ≤ 1.00e-02; ***: 1.00e-04 <p ≤ 1.00e-03; ****: p ≤ 1.00e-04.
Fig. 5
Fig. 5. Imputation performance of EN, LGBM, and AE machine learning models on an independent t-CyCIF dataset.
Dataset was obtained from a breast cancer tissue microarray that includes two cores each from 26 tumors. Imputation results are similar to those obtained in our primary cohort and dataset, showing that our imputation methods are applicable beyond the primary cohort to other cohorts and datasets. Results were created using n = 475359 single cells. We used 30 replicates with different train & test splits to validate performance metrics. Table 11 provides a detail overview for all boxenplots. p-values were calculated using a two-sided Mann-Whitney test and the Benjamini-Hochberg procedure for multiple testing comparisons. Each boxenplot displays nested boxes corresponding to progressively smaller quantile ranges. The central, widest box represents the interquartile range (25th–75th percentiles), capturing the middle 50% of the data. Narrower boxes above and below reflect increasingly extreme quantiles (e.g., 12.5th–87.5th, 6.25th–93.75th), providing a detailed view of distribution tails. Outliers beyond the outermost quantile range are shown as diamonds. p-values: ns: p ≤ 1.00e + 00; *: 1.00e-02 <p ≤ 5.00e-02; **: 1.00e-03 <p ≤ 1.00e-02; ***: 1.00e-04 <p ≤ 1.00e-03; ****: p ≤ 1.00e-0.
Fig. 6
Fig. 6. Using spatial information improves imputation performance for LGBM.
a Schematic for creating a feature table based on spatial neighbors found in selected radii. Exemplary 15 µm radius is shown. Red marks the cell of interest (or origin) and protein abundance levels of cells in its neighborhood are averaged to get neighborhood abundance levels. b Light gradient boosting machine (LGBM) imputation results across patients with mean absolute error (MAE) scores for 0 µm, 30 µm, 60 µm reveal significant improvement for several proteins such as EGFR, ER, ECAD and PR. Results were created using n = 475359 single cells. We used 30 replicates with different train & test splits to validate performance metrics. Supplementary Table 12 provides a detail overview for all boxenplots. p-values were calculated using a two-sided Mann-Whitney test and the Benjamini-Hochberg procedure for multiple testing comparisons. Each boxenplot displays nested boxes corresponding to progressively smaller quantile ranges. The central, widest box represents the interquartile range (25th–75th percentiles), capturing the middle 50% of the data. Narrower boxes above and below reflect increasingly extreme quantiles (e.g., 12.5th–87.5th, 6.25th–93.75th), providing a detailed view of distribution tails. Outliers beyond the outermost quantile range are shown as diamonds. p-values: ns: p ≤ 1.00e + 00; *: 1.00e-02 <p ≤ 5.00e-02; **: 1.00e-03 <p ≤ 1.00e-02; ***: 1.00e-04 <p ≤ 1.00e-03; ****: p ≤ 1.00e-04.
Fig. 7
Fig. 7. Using spatial information improves imputation performance.
a Single protein imputation mean absolute error (MAE) for 0 µm, 30 µm and 60 µm leads to improved imputation accuracy for proteins such as AR, CK14, CK19, ER and more. Proteins for which imputation improved when using spatial information are in bold and underlined. b Multi-protein imputation MAE scores for 0 µm, 30 µm and 60 µm and leads to improved imputation accuracy for proteins such as AR, CK14, CK19, ER and more. c Comparison of light gradient boosting machine (LGBM) and autoencoder (AE) imputation performance for 0,30 and 60 µm shows similar performance of all models. Results were created using n = 475359 single cells. We used 30 replicates with different train & test splits to validate performance metrics. Supplementary Table 13, Supplementary Table 14 and Supplementary Table 15 provide detailed boxen plot descriptions for this figure. p-values were calculated using a two-sided Mann-Whitney test and the Benjamini-Hochberg procedure for multiple testing comparisons. Each boxenplot displays nested boxes corresponding to progressively smaller quantile ranges. The central, widest box represents the interquartile range (25th–75th percentiles), capturing the middle 50% of the data. Narrower boxes above and below reflect increasingly extreme quantiles (e.g., 12.5th–87.5th, 6.25th–93.75th), providing a detailed view of distribution tails. Outliers beyond the outermost quantile range are shown as diamonds. p-values: ns: p ≤ 1.00e + 00; *: 1.00e-02 <p ≤ 5.00e-02; **: 1.00e-03 <p ≤ 1.00e-02; ***: 1.00e-04 <p ≤ 1.00e-03; ****: p ≤ 1.00e-04.
Fig. 8
Fig. 8. Experimental setup and validation for using imputed values to predict treatment timepoints for single cells.
a An initial tile classifier was used to identify tissue strongly associated with treatment timepoints. Next, cells in tissue associated with treatment timepoints were used to train a cell classifier to identify whether cells came from pre-treatment or post-treatment biopsies. b Complete biopsy overview, with a zoomed in view depicting green squares which show tiles strongly associated with treatment timepoints. c Classification accuracy (higher bar is better) of the cell classifier shows improved performance using imputed values as compared to performance using ground truth or removed protein values. 13125 tiles were used to run the models, with a replicate count of 30. Supplementary Table 16 provides detailed boxen plot descriptions. p-values were calculated using a two-sided Mann-Whitney test and the Benjamini-Hochberg procedure for multiple testing comparisons. Each boxenplot displays nested boxes corresponding to progressively smaller quantile ranges. The central, widest box represents the interquartile range (25th–75th percentiles), capturing the middle 50% of the data. Narrower boxes above and below reflect increasingly extreme quantiles (e.g., 12.5th–87.5th, 6.25th–93.75th), providing a detailed view of distribution tails. Outliers beyond the outermost quantile range are shown as diamonds. p-values: ns: p ≤ 1.00e + 00; *: 1.00e-02 <p ≤ 5.00e-02; **: 1.00e-03 <p ≤ 1.00e-02; ***: 1.00e-04 <p ≤ 1.00e-03; ****: p ≤ 1.00e-04.

Update of

References

    1. Francisco-Cruz, A., Parra, E. R., Tetzlaff, M. T. & Wistuba, I. I. Multiplex Immunofluorescence Assays. Methods Mol. Biol.2055, 467–495 (2020). - PubMed
    1. Sheng, W. et al. Multiplex Immunofluorescence: A Powerful Tool in Cancer Immunotherapy. Int J. Mol. Sci.24–3086 (2023). - PMC - PubMed
    1. Neumann, E. K. et al. Highly multiplexed immunofluorescence of the human kidney using co-detection by indexing. Kidney Int.101, 137–143 (2022). - PMC - PubMed
    1. Werlein, C. et al. Inflammation and vascular remodeling in COVID-19 hearts. Angiogenesis26–233–248 (2023). - PMC - PubMed
    1. Lewis, S. M. et al. Spatial omics and multiplexed imaging to explore cancer biology. Nat. Methods18, 997–1012 (2021). - PubMed

LinkOut - more resources