Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jul 26;5(1):63-71.e6.
doi: 10.1016/j.cels.2017.06.003. Epub 2017 Jul 12.

Unsupervised Extraction of Stable Expression Signatures from Public Compendia with an Ensemble of Neural Networks

Affiliations

Unsupervised Extraction of Stable Expression Signatures from Public Compendia with an Ensemble of Neural Networks

Jie Tan et al. Cell Syst. .

Abstract

Cross-experiment comparisons in public data compendia are challenged by unmatched conditions and technical noise. The ADAGE method, which performs unsupervised integration with denoising autoencoder neural networks, can identify biological patterns, but because ADAGE models, like many neural networks, are over-parameterized, different ADAGE models perform equally well. To enhance model robustness and better build signatures consistent with biological pathways, we developed an ensemble ADAGE (eADAGE) that integrated stable signatures across models. We applied eADAGE to a compendium of Pseudomonas aeruginosa gene expression profiling experiments performed in 78 media. eADAGE revealed a phosphate starvation response controlled by PhoB in media with moderate phosphate and predicted that a second stimulus provided by the sensor kinase, KinB, is required for this PhoB activation. We validated this relationship using both targeted and unbiased genetic approaches. eADAGE, which captures stable biological patterns, enables cross-experiment comparisons that can highlight measured but undiscovered relationships.

Keywords: Pseudomonas aeruginosa; crosstalk; denoising autoencoders; ensemble modeling; gene expression; neural networks; phosphate starvation.

PubMed Disclaimer

Figures

Figure 1
Figure 1. ADAGE model and signature definition. See also Figure S1
A In ADAGE, every gene contributes a weight value to every node reflected by the edge strength. Orange edge: high positive weight; blue: high negative weight; dotted edges: low positive or negative weights. B The distribution of a node’s weights is roughly normal and centered at zero. Genes with weights higher than the positive high-weight (HW) cutoff (GeneE and GeneA) form the gene signature Node1pos. Genes with weights lower than the negative HW cutoff (GeneC) form the gene signature Node1neg.
Figure 2
Figure 2. The construction and performance of eADAGE. See also Figure S3
A eADAGE construction workflow. 100 individual ADAGE models were built on the input dataset. Nodes from all models were extracted and clustered based on the similarities in their weight vectors. Nodes from different models were rearranged by their clustering assignments. Weight vectors from nodes in the same cluster were averaged and thus becoming the final weight vector of a newly constructed node in an eADAGE model. B KEGG pathway coverage comparison between ADAGE, corADAGE and eADAGE. C The enrichment significance of three example KEGG pathways in ADAGE models with different sizes and eADAGE models. Grey dotted line indicates FDR q-value of 0.05. D The distribution of KEGG pathway coverage rate of ADAGE and eADAGE models. E Comparison among PCA, ICA, and eADAGE in KEGG pathway coverage at different significance levels.
Figure 3
Figure 3. eADAGE signatures with medium-specific patterns. See also Figure S4 and Table S3
A Activity of Node147pos in M9-based media. B Activity of Node164pos in all media. C Expression heatmaps of genes in Node164pos across samples in NGM+<0.1phosphate, peptone, King’s A, and PIA media. Heatmap color range is determined by the Z-scored gene expression of all samples in the compendium.
Figure 4
Figure 4. PhoA activity, as seen by the colorimetric BCIP assay in various media
A PhoA activity, as seen by the blue-colored product of BCIP cleavage, is dependent on low phosphate concentrations, phoB, phoR and, in NGM, kinB. B PhoA is active in King’s A, Peptone and PIA and is dependent on phoB and phoR and on PIA, kinB at 16 hours. C PhoA is active in King’s A, Peptone and PIA and is dependent on phoB and, on PIA, kinB after 32 hours. D PhoA activity is dependent on phosphate concentrations < 0.6 mM, phoB, phoR and, at 0.5 mM phosphate, kinB on MOPS. Not shown, 0.2 mM mimics 0.1mM and 0.7mM – 0.9mM mimic 1.0 mM.

References

    1. Alter O, Brown PO, Botstein D. Singular value decomposition for genome-wide expression data processing and modeling. Proc. Natl. Acad. Sci. 2000;97:10101–6. - PMC - PubMed
    1. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al. Gene Ontology: tool for the unification of biology. Nat. Genet. 2000;25:25–29. - PMC - PubMed
    1. Beaulieu-Jones BK, Greene CS Pooled Resource Open-Access ALS Clinical Trials Consortium. Semi-supervised learning of the electronic health record for phenotype stratification. J. Biomed. Inform. 2016;64:168–178. - PubMed
    1. Beaulieu-Jones BK, Moore JH. Missing data imputation in the electronic health record using deeply learned autoencoders. Pac Sym Biocomput. 2017:207–218. - PMC - PubMed
    1. Bengio Y, Courville A, Vincent P. Representation Learning: A Review and New Perspectives. TPAMI. 2013;35:1798–1828. - PubMed

Publication types

LinkOut - more resources