Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Nov 1;35(22):4671-4678.
doi: 10.1093/bioinformatics/btz256.

Hybrid analysis of gene dynamics predicts context-specific expression and offers regulatory insights

Affiliations

Hybrid analysis of gene dynamics predicts context-specific expression and offers regulatory insights

Justin D Finkle et al. Bioinformatics. .

Abstract

Motivation: To understand the regulatory pathways underlying diseases, studies often investigate the differential gene expression between genetically or chemically differing cell populations. Differential expression analysis identifies global changes in transcription and enables the inference of functional roles of applied perturbations. This approach has transformed the discovery of genetic drivers of disease and possible therapies. However, differential expression analysis does not provide quantitative predictions of gene expression in untested conditions. We present a hybrid approach, termed Differential Expression in Python (DiffExPy), that uniquely combines discrete, differential expression analysis with in silico differential equation simulations to yield accurate, quantitative predictions of gene expression from time-series data.

Results: To demonstrate the distinct insight provided by DiffExpy, we applied it to published, in vitro, time-series RNA-seq data from several genetic PI3K/PTEN variants of MCF10a cells stimulated with epidermal growth factor. DiffExPy proposed ensembles of several minimal differential equation systems for each differentially expressed gene. These systems provide quantitative models of expression for several previously uncharacterized genes and uncover new regulation by the PI3K/PTEN pathways. We validated model predictions on expression data from conditions that were not used for model training. Our discrete, differential expression analysis also identified SUZ12 and FOXA1 as possible regulators of specific groups of genes that exhibit late changes in expression. Our work reveals how DiffExPy generates quantitatively predictive models with testable, biological hypotheses from time-series expression data.

Availability and implementation: DiffExPy is available on GitHub (https://github.com/bagherilab/diffexpy).

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Overview of DiffExPy analysis. (Left) Genes are categorized as differentially expressed genes (DEGs), dynamic DEGs (dDEGs) or differentially responding genes (DRGs) from time-course gene expression data. Discrete responses (in brackets) are determined for each contrast. (Center) Stochastic differential equation (SDE) systems that match the dDEG gene profiles are selected from a library of possible models and combined into an ensemble model. The ensembles can predict gene expression behavior in new, untested conditions. (Right) Biological insights are gained by associating GO terms with gene classifications and associating TFs with groups of genes that share discrete differential behavior
Fig. 2.
Fig. 2.
Ensembles of minimal SDE systems trained on PI3Kinh and WT data accurately predict expression in untrained PI3K KI condition. (A) Three examples of dDEGs with matching dynamical models. (A, top row) Normalized gene expression for each gene in PI3Kinh and WT used to train the models. The discrete response of the pairwise contrasts (in brackets) for each dDEG show when differential expression occurs. (A, middle row) Normalized gene expression for model predictions, null model predictions and true PI3K KI expression. Trained and null model predictions are median values, and the filled regions show the 83% CI of the median. (A, bottom row) Network diagrams summarize the ensemble models matched to each gene in the training condition. (B) AUROC curve plot of different methods for sorting gene predictions. Sorting by mean LFC between the training conditions places more accurate predictions at the top of the list. A threshold for selecting more accurate predictions (purple, dashed line) is calculated using the elbow rule of the sorted mean LFC values in the training condition (Supplementary Fig. S7). (C) Box plots show the normalized and absolute difference in MSE of the trained models compared with the paired random models for all genes. The top set of genes (purple) were determined by the elbow rule and are significantly more likely to generate more accurate predictions. P-values were calculated using the Wilcoxon signed-rank test. **P < 0.01, ****P < 0.0001
Fig. 3.
Fig. 3.
Summary of gene classifications comparing PTEN KO to WT. (A) Heatmap of row normalized LFC for all DEGs. Genes that are classified as dDEG or DRG are also labeled. (B) Overlap of genes that are classified as DEG, dDEG, DRG or some combination. By definition, all dDEGs and DRGs must also be DEGs. (C) Comparison of distributions of GO term depths uniquely associated with intersections of gene sets. Even though no genes are classified as only dDEG or DRG, these gene sets associate with unique, specific GO terms not found in the dDEG set. The number of unique terms associated with each set, n, is shown. P-values were calculated using a discrete KS test. *P < 0.05, ***P < 0.001
Fig. 4.
Fig. 4.
Specific timing of changes in gene expression identifies possible regulators. (A, left) Sankey plot of discrete FC between PTEN KO and WT compared with FC before the stimulus is applied shows when DRGs respond to the stimulus. Nodes (dark gray) are scaled by the number of genes in that state. Edges (light gray) show the fraction of genes moving from one state to another between time points. The node and edges highlighted in orange show the set of genes associated with SUZ12. (A, right). LFC values relative to LFC before stimulus for SUZ12 and several genes associated with SUZ12. (B, left) Sankey plot of the cumulative differences in PTEN KO slope and WT slopes. Segments highlighted in orange show genes whose KO expression increases more than their WT expression between either 90 and 180 min or 180 and 300 min. These genes are enriched for association with FOXA1. (B, right) LFC values relative to LFC at the previous time point for FOXA1 and several associated genes. Shaded regions in time-series plots show the 83% confidence intervals

References

    1. Bansal M. et al. (2006) Inference of gene regulatory networks and compound mode of action from time course gene expression profiles. Bioinformatics, 22, 815–822. - PubMed
    1. Bernardo G.M. et al. (2013) Foxa1 represses the molecular phenotype of basal breast cancer cells. Oncogene, 32, 554.. - PMC - PubMed
    1. Comet I. et al. (2016) Maintaining cell identity: PRC2-mediated regulation of transcription and cancer. Nat Rev. Cancer, 16, 803.. - PubMed
    1. Consortium G.O. (2016) Expansion of the gene ontology knowledgebase and resources. Nucleic Acids Res., 45, D331–D338. - PMC - PubMed
    1. Emilsson V. et al. (2008) Genetics of gene expression and its effect on disease. Nature, 452, 423.. - PubMed

Publication types

MeSH terms