Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Apr;17(4):e10026.
doi: 10.15252/msb.202010026.

CODEX, a neural network approach to explore signaling dynamics landscapes

Affiliations

CODEX, a neural network approach to explore signaling dynamics landscapes

Marc-Antoine Jacques et al. Mol Syst Biol. 2021 Apr.

Abstract

Current studies of cell signaling dynamics that use live cell fluorescent biosensors routinely yield thousands of single-cell, heterogeneous, multi-dimensional trajectories. Typically, the extraction of relevant information from time series data relies on predefined, human-interpretable features. Without a priori knowledge of the system, the predefined features may fail to cover the entire spectrum of dynamics. Here we present CODEX, a data-driven approach based on convolutional neural networks (CNNs) that identifies patterns in time series. It does not require a priori information about the biological system and the insights into the data are built through explanations of the CNNs' predictions. CODEX provides several views of the data: visualization of all the single-cell trajectories in a low-dimensional space, identification of prototypic trajectories, and extraction of distinctive motifs. We demonstrate how CODEX can provide new insights into ERK and Akt signaling in response to various growth factors, and we recapitulate findings in p53 and TGFβ-SMAD2 signaling.

Keywords: cell signaling; convolutional neural network; data exploration; live biosensor imaging; time series analysis.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no conflict of interest.

Figures

Figure 1
Figure 1. CNN latent features reveal the landscape of single‐cell signaling dynamics
  1. Schematic of the CNN classifier architecture used in CODEX. In this example, MCF10A cells are exposed to 6 different GFs and single‐cell ERK/Akt activity is reported using biosensors. The GF treatments form classes that the classifier is trained to identify based on the bivariate ERK/Akt single‐cell input trajectories. The input trajectories are passed through a cascade of convolution layers, followed by a global average pooling layer that compresses the convolution responses into a one‐dimensional vector of features (red rectangle). This latent representation of the input is then referred to as CNN features. Finally, the CNN features are passed to a fully connected layer to perform the classification.

  2. t‐SNE projection of the CNN features of all ERK/Akt trajectories in the validation set. Each point represents a bivariate ERK/Akt trajectory from a single cell. Hulls indicate areas associated with a strong classification confidence for each GF. Shading shows the point densities. Diamonds indicate the positions of the 10 top prototypes (see Materials and Methods) for each GF. The solidity of the hull contour line indicates the minimal classification confidence for all prototypes in the hull.

  3. Representative ERK/Akt prototype trajectories from each hull indicated in (B). Pathway activity is reported as the ratio of cytosolic over nuclear KTR fluorescence (C/N). Each pathway activity was preprocessed by removing the average activity of this pathway across all trajectories in the training set. Numbers in the bottom‐right corners indicate the CNN predicted probability for the trajectories to belong to their actual class.

Figure EV1
Figure EV1. CNN highlights discriminative dynamics motifs in synthetic data
  1. A synthetic dataset resembling pulsatile signaling activities was created, and a CNN was trained to recognize its two classes (see Materials and Methods and Appendix Note 2). Each trajectory comprises four pulsing events that consist of either full Gaussian peaks or half‐truncated Gaussian peaks. Events are triggered at random locations but with a minimal distance from each other. The class labeled “Full” displays predominantly full peaks (2, 3, or 4), while the class labeled “Truncated” displays predominantly truncated Gaussian peaks (0, 1, or 2 full peaks).

  2. t‐SNE projection of the CNN features of the trajectories in the validation set. Each cluster of points regroups trajectories based on their number of full peaks. Diamonds indicate prototype trajectories.

  3. Prototype trajectories (blue and red) sampled by selecting trajectories whose CNN features were uncorrelated (see Materials and Methods), and trajectories for which the model confidence is lowest (green). Trajectories are the ones indicated with diamonds in (B). The number of full peaks in each trajectory is indicated next to the diamond symbol. The minimal threshold of model confidence to sample prototypes was set to 90%.

  4. Motifs extracted with the CAM method (see Materials and Methods). Motifs were clustered with DTW distance and Ward linkage. Representative medoid motifs of the clusters are shown. The class of the trajectory from which the motif was extracted is indicated in the top‐right corner.

  5. Distribution of the motifs in the clusters according to the class of their origin trajectory.

Figure EV2
Figure EV2. Fluorescent biosensors of ERK and Akt activity report single‐cell signaling dynamics in response to growth factors
  1. Scheme of the multiplexed, genetically encoded ERK (ERK‐KTR‐mTurquoise2) and Akt (FoxO3A‐mNeonGreen) biosensors and the constitutive nuclear marker (H2B‐miRFP703). Both biosensors translocate from nucleus to cytosol when phosphorylated by their respective kinase.

  2. Image analysis pipeline to obtain single‐cell ERK/Akt trajectories. A random forest classifier (Ilastik) was trained to segment nuclei. Nuclei (red outline) are used for tracking each cell across the time lapse. Expansion of the nuclear mask provides a cytosol area (cyan outline). The ratio of pixel intensities of the cytosolic over the nuclear area (C/N) provides a proxy for ERK or Akt activity, which is displayed over each nucleus with a specific color code (high/low ERK/Akt activity = warm/cold colors, respectively).

  3. ERK and Akt activity channels in MCF10A monolayers 25 h after the stimulation with the different growth factors.

  4. Randomly selected ERK and Akt activity single‐cell trajectories from MCF10A monolayers treated with the different growth factors. The dashed black vertical line represents the first frame of the 2nd phase of signaling trajectories used to train CNN. GFs have been added at the beginning of the traces.

  5. Population averages of ERK/Akt activities of a whole field of view. Shading indicates 95% confidence interval of the mean. At least 1,200 cells for each GF pooled from two technical replicates.

Figure EV3
Figure EV3. Applying CODEX with a ResNet architecture yields similar results to the ones obtained with a plain CNN architecture
  1. t‐SNE projection of the ResNet features of all ERK/Akt trajectories in the validation set. Each point represents a bivariate ERK/Akt trajectory from a single cell. Hulls indicate areas associated with a strong classification confidence for each GF. Shading shows the point densities. Diamonds indicate the positions of the 10 top prototypes (see Materials and Methods) for each GF.

  2. Representative ERK/Akt prototype trajectories of the area indicated in (A). Each pathway activity was preprocessed by removing the average activity of this pathway across all trajectories in the training set.

  3. Discriminative signaling motifs were extracted from the training and validation top prototypes using a CAM‐based approach (see Materials and Methods). The motifs were clustered using DTW distance and squared Ward's linkage. Representative motifs of each cluster, based on the minimization of mean intra‐cluster distance, are displayed. Bottom‐right labels indicate the class of the trajectory from which the motif was extracted. Each pathway activity was preprocessed by removing the average activity of this pathway across all trajectories in the training set.

  4. Distribution of the signaling motifs clusters across the GF treatments.

Figure EV4
Figure EV4. Quantification of additional interpretable, class‐discriminative features identified by visual inspection of the CODEX output
  1. Scatter plot of frequency of ERK peaks versus frequency of Akt peaks in single cells. Crosses indicate the mean values and the standard deviations. At least 1,200 cells for each GF pooled from two technical replicates. Colors indicate the GF and are the same as in (B).

  2. Proportion of synchronous ERK and Akt peaks in single cells. Mean and standard deviation are shown. At least 1,200 cells for each GF pooled from two technical replicates.

Figure EV5
Figure EV5. CODEX identifies different sex‐ and light‐dependent Drosophila movements
Male and female Drosophila movements in a tube were recorded over 12 h, in presence or absence of light, and reported as univariate time series (Fulcher & Jones, 2017). A CNN was trained to recognize the combination of sex and light conditions based on the individual movements (see Materials and Methods and Appendix Note 5).
  1. t‐SNE projection of the CNN features for the training, validation, and test sets pooled together. Diamonds indicate the 10 top prototypes for each class.

  2. The top prototype trajectory for each class.

  3. Comparison of autocorrelation levels and spectral flatness between the 20 top female prototypes and 20 top male prototypes against non‐prototype trajectories. Female (resp. male) prototypes were obtained by pooling the 10 top prototypes of female (resp. male) drosophila under both day and night light conditions. Boxes indicate the upper and lower quartiles, the central band indicates the median and whiskers extend to individuals up to 1.5 interquartile away from the median. Medians of the distributions were compared with two‐sided Wilcoxon's tests; ***P‐value < 0.01. n = 574 for non‐prototype female trajectories; n = 524 for non‐prototype male trajectories.

  4. Autocorrelation level and spectral flatness against the probability of trajectories to be classified as a female class. The latter probability is the sum of the probability of trajectories to belong to the female‐day and female‐night classes. The resulting sum was ranked, such that rank 1 corresponds to the probability of the trajectory that has the lowest prediction outcome to be classified as female. Autocorrelation levels were computed by averaging autocorrelation values for lags ranging from 1 to 10 time points.

Figure 2
Figure 2. Discriminative motifs in ERK/Akt trajectories highlight GF signaling signatures and ease the identification of interpretable, GF‐specific features
  1. Discriminative signaling motifs were extracted from the training and validation top prototypes using a CAM‐based approach (see Materials and Methods). The motifs were clustered using DTW distance and Ward's linkage. Representative motifs of each cluster, based on the minimization of median intra‐cluster distance, are displayed (see Materials and Methods). Bottom‐right labels indicate the class of the trajectory from which the motif was extracted. Each pathway activity was preprocessed by removing the average activity of this pathway across all trajectories in the training set.

  2. Distribution of the signaling motifs clusters across the GF treatments.

  3. Scatter plot of the Pearson correlation coefficient between ERK and Akt trajectories against the median ratio of ERK over Akt activity in single cells. For each trajectory, ratios are computed on raw data, at each time point and summarized with median. Crosses indicate the mean values and the standard deviations of all raw single‐cell trajectories. At least 1,200 cells for each GF pooled from two technical replicates.

Figure 3
Figure 3. Classic machine‐learning workflows for time series exploration provide a more coarse‐grained overview of ERK/Akt signaling landscape than CODEX
  1. A

    Schematic of the two workflows used for analyzing ERK/Akt signaling in response to GFs.

  2. B

    Feature importance of a random forest trained to classify ERK/Akt time series according to the GF treatments. The 12 most important features are shown. The signaling pathway (ERK or Akt) associated with each feature is indicated. For the permutation entropy, the parameters are as follows: D length of the subwindows and T lag between the windows (Christ et al, 2018).

  3. C, D

    Biplots of the first three principal components derived from the time series features PCA. Each symbol represents an individual trajectory. The circles indicate normal ellipses for each group with a confidence of 95%. The larger symbols in the middle of the ellipses are visual helps for the identification of the groups. The percentiles in the axis labels indicate the amount of total variance carried by the principal components.

  4. E

    Features that contributed the most to the three first principal components (see Materials and Methods).

  5. F

    t‐SNE projection of the first 197 principal components, which carry 75% of the total variance of the features. Each point represents an individual trajectory; shading indicates point density. The diamond symbols indicate the trajectories shown in (G).

  6. G

    Manually selected trajectories which are highlighted with a diamond symbol and a label in (F).

Figure 4
Figure 4. CODEX identifies TGFβ dose‐dependent signaling states
MCF10A cells were exposed to increasing doses of TGFβ, and single‐cell SMAD2 responses were recorded using a fluorescent biosensor (Strasen et al, 2018). A CNN was trained to classify SMAD2 activity trajectories according to the TGFβ dose.
  1. A–C

    t‐SNE projection of the CNN latent features for the training, validation, and test sets pooled together. Trajectories representations are colored according to: the TGFβ dose (A), the DTW clusters (B), or the CNN features clusters (C). The DTW clusters are the ones of the original study (Strasen et al, 2018). CNN features were clustered using L1 distance and partitioned with hierarchical clustering and Ward linkage.

  2. D, E

    Distribution of the trajectories in the CNN features clusters according to their corresponding TGFβ dose (D) and their DTW cluster (E). Same colors as in (C).

  3. F

    Comparison of DTW clusters and CNN features clusters. Median trajectories for each cluster (see Materials and Methods) are reported in bold colored lines, colors matching (B) and (C), gray shade indicate interquartile range. The DTW clusters (1–6) are made of (946, 340, 200, 278, 218, 83) single‐cell trajectories, respectively. The CNN clusters (1–5) are made of (505, 429, 451, 492, 188) single‐cell trajectories, respectively. A random sample of trajectories for DTW clusters and the centroid trajectories (see Materials and Methods) for each CNN features cluster are shown.

Figure 5
Figure 5. CODEX identifies cell line‐specific p53 responses under increasing ionizing radiation doses
12 different cell lines were exposed to five doses of ionizing radiations, and single‐cell p53 responses were reported using a fluorescent reporter (Stewart‐Ornstein & Lahav, 2017). A CNN was trained to recognize the combination of cell line and radiation dose from p53 trajectories (i.e., one out of 60 classes; see Materials and Methods and Appendix Note 4).
  1. A–C

    t‐SNE projection of the CNN features for the training, validation, and sets pooled together. Trajectories representation is colored according to: the cell lines (A), the ionizing radiation doses (B), or the CNN features clusters (C). The p53 trajectories were clustered according to their standardized CNN features with L1 distance and partitioned with hierarchical clustering and Ward linkage.

  2. D

    Representative p53 trajectories of the clusters identified using CNN features clustering. Median trajectories are reported in bold colored lines, colors matching (C), gray shade indicates interquartile range. The CNN clusters (1–6) are made of (767, 554, 486, 574, 746, 698) single‐cell trajectories, respectively. Data are pooled from two to three replicates for each cell line. Individual traces indicate the medoid p53 single‐cell trajectories for each cluster (see Materials and Methods).

  3. E

    Distribution of the trajectories in the CNN feature clusters according to cell lines and ionizing radiation doses. Same colors as in (C).

  4. F

    Most abundant CNN features cluster for all trajectories in each combination of cell line and radiation dose combinations. The color of the dot indicates the most abundant cluster; the size of the dot indicates the proportion of cells classified in the indicated cluster. Same colors as in (C).

References

    1. Albeck JG, Mills GB, Brugge JS (2013) Frequency‐modulated pulses of ERK activity transmit quantitative proliferation signals. Mol Cell 49: 249–261 - PMC - PubMed
    1. Aoki K, Kondo Y, Naoki H, Hiratsuka T, Itoh RE, Matsuda M (2017) Propagating wave of ERK activation orients collective cell migration. Dev Cell 43: 305–317.e5 - PubMed
    1. Berg S, Kutra D, Kroeger T, Straehle CN, Kausler BX, Haubold C, Schiegg M, Ales J, Beier T, Rudy M et al (2019) ilastik: interactive machine learning for (bio)image analysis. Nat Methods 16: 1226–1232 - PubMed
    1. Berndt DJ, Clifford J (1994) Using dynamic time warping to find patterns in time series. KDD Workshop 10: 359–370
    1. Blum Y, Mikelson J, Dobrzyński M, Ryu H, Jacques M‐A, Jeon NL, Khammash M, Pertz O (2019) Temporal perturbation of ERK dynamics reveals network architecture of FGF2/MAPK signaling. Mol Syst Biol 15: e8947 - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources