Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Sep 26;15(1):8262.
doi: 10.1038/s41467-024-52605-x.

Enhanced feature matching in single-cell proteomics characterizes IFN-γ response and co-existence of cell states

Affiliations

Enhanced feature matching in single-cell proteomics characterizes IFN-γ response and co-existence of cell states

Karl K Krull et al. Nat Commun. .

Abstract

Proteome analysis by data-independent acquisition (DIA) has become a powerful approach to obtain deep proteome coverage, and has gained recent traction for label-free analysis of single cells. However, optimal experimental design for DIA-based single-cell proteomics has not been fully explored, and performance metrics of subsequent data analysis tools remain to be evaluated. Therefore, we here formalize and comprehensively evaluate a DIA data analysis strategy that exploits the co-analysis of low-input samples with a so-called matching enhancer (ME) of higher input, to increase sensitivity, proteome coverage, and data completeness. We assess the matching specificity of DIA-ME by a two-proteome model, and demonstrate that false discovery and false transfer are maintained at low levels when using DIA-NN software, while preserving quantification accuracy. We apply DIA-ME to investigate the proteome response of U-2 OS cells to interferon gamma (IFN-γ) in single cells, and recapitulate the time-resolved induction of IFN-γ response proteins as observed in bulk material. Moreover, we uncover co- and anti-correlating patterns of protein expression within the same cell, indicating mutually exclusive protein modules and the co-existence of different cell states. Collectively our data show that DIA-ME is a powerful, scalable, and easy-to-implement strategy for single-cell proteomics.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. DIA-ME enables ultra-high sensitivity and data completeness.
A DIA-ME principle: prevalent DIA data analysis is based on a two-step process. Peptides are identified and stored in an internal library, before their information is used to re-analyze (match) runs that are searched in parallel. Providing a high-input sample (Matching enhancer: ME) to the first search, more information can be gathered in the library, which aids to identify low-abundance signals in the low-input runs during matching. B Mixed-species experiment to evaluate DIA-ME data analysis (top). Two types of samples: seven low-input replicates containing H.sapiens proteome (HeLa, green) and twelve sets of H.sapiens samples spiked with E.coli K12 proteome (blue). Proteomic mixtures differed in their spiking ratio (5–20%) and total peptide amount (1–100 ng). Resulting files of spiked samples were used to evaluate the DIA-ME data analysis (bottom): low-input H.sapiens samples (green) were analyzed with a triplicate of spiked 5-ng (5× ME), 10-ng (10× ME) or 100-ng (100× ME) runs (blue). C Average and total (light) protein groups in the seven non-spiked 1-ng replicates by DIA-NN (blue) and Spectronaut (brown) using different analysis strategies (see B). Indiv: individual raw file analysis. MBR: Collective raw files analysis with activated matching. 1:1: Co-analysis with seven spiked 1-ng replicates. Results only shown for analyses involving a spiking ratio of 10%. D Heatmap of ranked protein group (PG) intensities for individual, MBR and 10× DIA-ME analysis of 1-ng HeLa replicates (R1 – 7). Six bins (divided by dashed lines) indicate the obtained data completeness in the respective intensity segment per analysis. E Upper left: ranked median protein intensities in different data analyses. Others: ranked protein intensities in 1-ng replicates (R1 – 7) after 10× DIA-ME analysis. Three high- and medium-abundance cytoskeleton proteins (yellow) and three low-abundance cell cycle-related proteins (red) shown, the latter only identified in DIA-ME analysis. Source data are provided as a Source Data file.
Fig. 2
Fig. 2. Low FPR and reliable feature matching in DIA-ME.
A False positive rate, i.e. percentage of detected E.coli peptides, in non-spiked 1-ng H.sapiens samples (N = 7) for different types of data analysis and DIA software. Analyses without spiked samples, i.e. without entrapped matching, are indicated in green and grey (light-: without MBR, dark-: MBR), while co-analyses with spiked samples are indicated in blue and brown for DIA-NN and Spectronaut, respectively. The shade of the color indicates the E.coli spiking ratio. Error bars are shown as mean ± sd. B Receiver operating characteristics (ROC) of default q-value filters in DIA-NN (left) and Spectronaut (right) for data analyses involving spiked ME samples with 10% spiking ratio (DIA-NN: light blue (1:1) to dark grey (100× DIA-ME), Spectronaut: light brown (1:1) to black (100× DIA-ME)). Areas under ROC (AUROC) are indicated in parentheses, while the diagonal line represents a random classification. C False transfer rate, i.e. percentage of E.coli peptides among identifications that were transferred by matching, in non-spiked 1-ng H.sapiens samples (N = 7) for different types of data analysis and DIA software. Rate was set to 100% when fewer H.sapiens peptides but more E.coli peptides were identified after matching. Color-coding as in (A). Error bars are shown as mean ± sd. Source data are provided as a Source Data file.
Fig. 3
Fig. 3. Stable precision and accuracy in protein quantification using directLFQ normalization.
A Violin plot of human protein CVs in 1-ng samples (N = 7) with (dark green) and without MBR (light green) on the left and after matching against 1-ng (1:1), 5-ng (5× ME), 10-ng (10× ME) and 100-ng (100× ME) samples on the right. Violins are colored in blue shades according to the spiking ratio in ME samples. Black boxes in the violins show the dispersion of values between the first and third quartile with the white line representing the median of the dataset. Whiskers and violins show the entire range from the minimum to the maximum data point. Number of proteins in each analysis indicated beneath violins. B Scatter plot of human protein CV values across 1-ng replicates (N = 7) dependent on their reported abundance for different types of data analysis (only shown for 10% spiking). White: proteins already identified in MBR analysis; blues and dark grey: proteins exclusively found using DIA-ME. Number n of proteins per analysis indicated. C Pearson correlation heatmap within and among different types of data analysis. Correlations to conventional MBR analysis are framed. Range of observed correlations in DIA-ME analyses highlighted on the top. D Principal component analysis localizing individual replicates from all performed searches in a two-dimensional coordinate system. Data analysis with and without MBR (light-) colored in green, and DIA-ME analyses illustrated in blue shades and dark grey. E Density plot showing the deviation from expected ratios in MBR and DIA-ME analyses for H.sapiens (blues, inverted density) and low-abundance E.coli proteins (greens). Total number n of proteins with at least one ratio in the respective analysis indicated. Source data are provided as a Source Data file.
Fig. 4
Fig. 4. DIA-ME improves proteome coverage in IFN-γ treated U-2 OS cells.
A Experimental scheme of IFN-γ treatment and bulk preparation of U-2 OS cells in the DIA-ME workflow. Six time-points from three biological replicates were collected and samples were diluted to the indicated injection amounts. Three 200-pg injections (blue) and single-injections of 1–10 ng (ME samples, green) per time-point and biological replicate resulted in 108 runs. Obtained files were co-analyzed by DIA-ME in DIA-NN. B Total (greys) and average protein groups (blues) identified in 200-pg samples after co-analysis with 1-ng, 2-ng10× and 10-ng50× MEs compared to MBR analysis without references samples. Individual protein identifications per sample indicated as grey dots. MEs were derived from control samples before (0 hours) and after 24 h treatment. C Venn diagram of identified protein groups in MBR (light blue) and 10× DIA-ME analysis (blue). Overlap (Szymkiewicz–Simpson) and Jaccard index are given as a measure of the similarity of both populations (see methods). D Ranked median protein group intensities for MBR (white) and DIA-ME analysis (blue). E Joint plot (center) of histogram distributions (top and left) of average peptides per protein identified in MBR (y-axis, light blue) and DIA-ME analysis (x-axis, blue). The scatter represents proteins identified in both analyses with their color showing the ratio of identified peptide numbers, indicating higher identifications in MBR (blue) or in DIA-ME (red). Peptide number equality (ratio of 1) shown as black line. F Pearson correlation heatmap of time-point samples (R1–R3: biological origin), showing correlations from yellow (low) to dark blue (high). G Principal component analysis of time-point samples after 10× DIA-ME analysis. Control samples (0 h) shown in light green and treated samples shown in blue shades. Source data are provided as a Source Data file.
Fig. 5
Fig. 5. Exploration of IFN-γ-induced immune response from the enhanced analysis of 200-pg samples.
A Volcano analysis of two-sided Student’s t-test results from 200-pg time-point samples after IFN-γ treatment and the respective 0 h control. Number n of differentially expressed proteins indicated. Proteins whose expression could only be observed in DIA-ME analysis are highlighted in blue, proteins described in IFN-γ-response are encircled in yellow. B Heatmap of known IFN-γ-responsive proteins after hierarchical clustering by Euclidean distance. Missing values were imputed by k-nearest neighbors. Colors indicate the quantitative changes compared to the protein’s median across all samples by Z-score. The black box outlines a cluster that shows gradual increasing Z-score over time. C Pie charts showing the proportional origin of differentially expressed proteins from A. Proteins already identified by MBR shown as dark grey wedges and exclusive identifications by DIA-ME shown as blue wedges. The small pie displays the respective proportions for known IFN-γ-responsive proteins. D Heatmap of significantly up- and down-regulated proteins from panel A (p-value < 0.05, log2 fold change > 0.95), showing their regulation over the course of treatment. Proteins indicated in blue were only found in DIA-ME analysis. E Line plot of selected proteins from D, showing their collective up-regulation over the course of treatment. F Gene set enrichment analysis of proteins up-regulated after 24 hours (log2 fold change > 0.58) using MSigDB hallmarks. Bars on the right represent the enrichment degree, while their colors specify the enrichment’s FDR. Bars on the left overlay the size of the enriched term and are depicted for MBR (white) and DIA-ME (blue) analysis with numbers indicating the additional contribution of DIA-ME. Bold terms indicated equivalent enrichment from the 200-ng bulk analysis (see Supplementary Fig. 7H). Source data are provided as a Source Data file.
Fig. 6
Fig. 6. DIA-ME-assisted analysis of individual U-2 OS cells reveals co-existence of metabolic states.
A Protein groups identified per individual cell in control (left) and 24 h IFN-γ-treated cells (right). Data were analyzed using conventional MBR (white) or DIA-ME (blue) using 10-cell MEs. The number N of single cells per condition are indicated. B Histogram of protein intensities for MBR (white) and DIA-ME analysis (blue). Curves calculated by Kernel density estimation. C Histogram of total identified peptides per protein for both analyses (white: MBR; blue: DIA-ME). D Scatter plot of log2-transformed average peptide numbers per protein identified in MBR (y-axis) and DIA-ME analysis (x-axis). Colors represent the ratio of peptide identifications between the two analyses, effectively showing higher protein sequence coverage in MBR (blue) or in DIA-ME (red). Equal peptide numbers (ratio of 1) represented by a black line. E Principal component analysis (PCA) of control and IFN-γ-treated cells after DIA-ME analysis based on known IFN-γ-responsive proteins (left). The two main principal components can be explained by the indicated processes on the right (Orange: PC1 – oxidative stress response proteins; Blue: PC2 – IFN-γ response proteins). Colors specify the degree of expression per cell ranging from blue (low) to red (high) for each of the indicated proteins. F Left: hierarchical cluster by Euclidean distance among individual cells (columns) and known IFN-γ-responsive proteins (rows). Colors in the heatmap indicate quantitative changes compared to the protein’s median value across all cells by Z-score. Frames show two identified clusters (C1, C2), enlarged on the right. Color bar on the bottom indicates whether the column originates from a control (blue) or IFN-γ-treated cell (orange). Source data are provided as a Source Data file.
Fig. 7
Fig. 7. Focused analysis of protein correlation modules in single cells.
A Co-expression analysis by Pearson correlation of proteins identified in ≥20 cells (n = 570). Two identified clusters (C1 and C2) with strong internal correlation but mutual anti-correlation are highlighted. B Individual expression levels per cell for pairwise positively correlated proteins JUP (x-axis) and DSP (y-axis) (left panel), and negatively correlated proteins PSMA6 (x-axis) and PHB2 (y-axis) (right panel). Colors indicate control (blue) and IFN-γ-treated cells (dark blue). Linear regression by Pearson shown as dashed line with the respective correlation factor r indicated. C Protein-protein interaction and protein-pathway interaction network, showing relations within and between clusters C1 and C2 of A. The nodes of the network represent the terms associated with indicated proteins (Red: cluster C1; Dark red: cluster C2), and (undirected) edges represent interactions between proteins. Nodes represent significantly enriched pathways, while nodes of similar function are grouped by their color. D Co-expression analysis of proteins selected from (C) showing inversely correlated expression of metabolic proteins between cells. Proteins were assigned to processes by manual annotation as indicated. Source data are provided as a Source Data file.

References

    1. Elowitz, M. B., Levine, A. J., Siggia, E. D. & Swain, P. S. Stochastic gene expression in a single cell. Science.297, 1183–1186 (2002). - PubMed
    1. Chen, X., Teichmann, S. A. & Meyer, K. B. From tissues to cell types and back: single-cell gene expression analysis of tissue architecture. Annu. Rev. Biomed. Data Sci.1, 29–51 (2018).
    1. Larsson, A. J. M. et al. Genomic encoding of transcriptional burst kinetics. Nature.565, 251–254 (2019). - PMC - PubMed
    1. Rodriguez, J. et al. Intrinsic dynamics of a human gene reveal the basis of expression heterogeneity. Cell.176, 213–226 (2019). - PMC - PubMed
    1. Raj, A. & van Oudenaarden, A. Nature, nurture, or chance: stochastic gene expression and its consequences. Cell.135, 216–226 (2008). - PMC - PubMed

Publication types

LinkOut - more resources