Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Feb 9;13(2):e1005335.
doi: 10.1371/journal.pcbi.1005335. eCollection 2017 Feb.

Representing high throughput expression profiles via perturbation barcodes reveals compound targets

Affiliations

Representing high throughput expression profiles via perturbation barcodes reveals compound targets

Tracey M Filzen et al. PLoS Comput Biol. .

Abstract

High throughput mRNA expression profiling can be used to characterize the response of cell culture models to perturbations such as pharmacologic modulators and genetic perturbations. As profiling campaigns expand in scope, it is important to homogenize, summarize, and analyze the resulting data in a manner that captures significant biological signals in spite of various noise sources such as batch effects and stochastic variation. We used the L1000 platform for large-scale profiling of 978 representative genes across thousands of compound treatments. Here, a method is described that uses deep learning techniques to convert the expression changes of the landmark genes into a perturbation barcode that reveals important features of the underlying data, performing better than the raw data in revealing important biological insights. The barcode captures compound structure and target information, and predicts a compound's high throughput screening promiscuity, to a higher degree than the original data measurements, indicating that the approach uncovers underlying factors of the expression data that are otherwise entangled or masked by noise. Furthermore, we demonstrate that visualizations derived from the perturbation barcode can be used to more sensitively assign functions to unknown compounds through a guilt-by-association approach, which we use to predict and experimentally validate the activity of compounds on the MAPK pathway. The demonstrated application of deep metric learning to large-scale chemical genetics projects highlights the utility of this and related approaches to the extraction of insights and testable hypotheses from big, sometimes noisy data.

PubMed Disclaimer

Conflict of interest statement

At the time of submission, all of the authors were paid employees of Merck & Co., Inc. (Kenilworth, NJ).

Figures

Fig 1
Fig 1. Experimental setup and architecture of the deep model used.
(A) Cells treated with compounds in 384-well plates. (B) Cell lysate used for ligation mediated PCR with gene-specific probe pairs, and the gene expression measured using an optically addressed bead array technology. (C) Raw intensity is normalized and converted to relative expression changes versus control (z-scores) on a plate-wise basis. Variability is observed between biological replicates.
Fig 2
Fig 2
(A) Metric learning network: a pair of 978-element z-score vectors is input to the network as adjacent vectors. Data is transformed through two layers (400 followed by 100 units), of nonlinearities (noisy sigmoid activation functions). The activations of the second hidden layer (H2(x1) and H2(x2)) are combined in the output layer by calculating a Euclidean distance between the two representations. The margin cost is calculated based on the -1/1 (non-replicate/replicate indicator) target and the squared distance. (B) Once the model is trained, expression profiles are converted to barcodes by passing them through the first two (now noisless sigmoid) hidden layers and thresholding the activation of the second hidden layer to yield perturbation barcodes.
Fig 3
Fig 3. Visualizations of the data based on z-scores or perturbation barcodes were examined to select candidate compounds in the phenotypic neighborhood of a series of known MAPK pathway inhibitors.
(A–D) t-SNE maps of the data, z-scores on top, perturbation barcode maps on the bottom. (A, B) the entire dataset is shown with the tested compounds in dark blue. (C,D) The neighborhood of the query MAPK pathway inhibitor compounds (orange) is shown. Common MAPK tools used for nearest neighbor analysis are circled. (E,F) Results of AP-1 reporter assays. Known MAPK actives are distinguished from unknowns predicted to be active in (C,D). (G,H) Rather than selecting neighbors of seed MAPK tool compounds in the t-SNE map, nearest neighbors in the native datasets were selected and tested in the AP-1 reporter assay. Key as in (E,F). See Fig C in S1 Text for breakdown by categories, including overlaps.

References

    1. Lamb J, Crawford ED, Peck D, Modell JW, Blat IC, Wrobel MJ, et al. The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. Science (New York, NY. 2006;313(5795):1929–35. - PubMed
    1. Nigsch F, Hutz J, Cornett B, Selinger DW, McAllister G, Bandyopadhyay S, et al. Determination of minimal transcriptional signatures of compounds for target prediction. EURASIP J Bioinform Syst Biol. 2012;2012(1):2 10.1186/1687-4153-2012-2 - DOI - PMC - PubMed
    1. Waring JF, Ciurlionis R, Jolly RA, Heindel M, Ulrich RG. Microarray analysis of hepatotoxins in vitro reveals a correlation between gene expression profiles and mechanisms of toxicity. Toxicol Lett. 2001;120(1–3):359–68. - PubMed
    1. Gao C, Weisman D, Lan J, Gou N, Gu AZ. Toxicity mechanisms identification via gene set enrichment analysis of time-series toxicogenomics data: impact of time and concentration. Environ Sci Technol. 2015;49(7):4618–26. 10.1021/es505199f - DOI - PMC - PubMed
    1. Iorio F, Bosotti R, Scacheri E, Belcastro V, Mithbaokar P, Ferriero R, et al. Discovery of drug mode of action and drug repositioning from transcriptional responses. Proceedings of the National Academy of Sciences of the United States of America. 2010;107(33):14621–6. 10.1073/pnas.1000138107 - DOI - PMC - PubMed

Publication types

MeSH terms

Substances