Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Apr 23:8:296.
doi: 10.3389/fchem.2020.00296. eCollection 2020.

Exploring the Use of Compound-Induced Transcriptomic Data Generated From Cell Lines to Predict Compound Activity Toward Molecular Targets

Affiliations

Exploring the Use of Compound-Induced Transcriptomic Data Generated From Cell Lines to Predict Compound Activity Toward Molecular Targets

Benoît Baillif et al. Front Chem. .

Abstract

Pharmaceutical or phytopharmaceutical molecules rely on the interaction with one or more specific molecular targets to induce their anticipated biological responses. Nonetheless, these compounds are also prone to interact with many other non-intended biological targets, also known as off-targets. Unfortunately, off-target identification is difficult and expensive. Consequently, QSAR models predicting the activity on a target have gained importance in drug discovery or in the de-risking of chemicals. However, a restricted number of targets are well characterized and hold enough data to build such in silico models. A good alternative to individual target evaluations is to use integrative evaluations such as transcriptomics obtained from compound-induced gene expression measurements derived from cell cultures. The advantage of these particular experiments is to capture the consequences of the interaction of compounds on many possible molecular targets and biological pathways, without having any constraints concerning the chemical space. In this work, we assessed the value of a large public dataset of compound-induced transcriptomic data, to predict compound activity on a selection of 69 molecular targets. We compared such descriptors with other QSAR descriptors, namely the Morgan fingerprints (similar to extended-connectivity fingerprints). Depending on the target, active compounds could show similar signatures in one or multiple cell lines, whether these active compounds shared similar or different chemical structures. Random forest models using gene expression signatures were able to perform similarly or better than counterpart models built with Morgan fingerprints for 25% of the target prediction tasks. These performances occurred mostly using signatures produced in cell lines showing similar signatures for active compounds toward the considered target. We show that compound-induced transcriptomic data could represent a great opportunity for target prediction, allowing to overcome the chemical space limitation of QSAR models.

Keywords: QSAR; cellular context; compound-induced transcriptomic data; machine learning; target prediction.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Data analysis pipeline performed in current work. Starting from the CMAP L1000 dataset, signatures produced at 10 μM and 24 h from 8 cell lines were extracted and used in t-SNE and distance plots. One dataset was built per cell line (GES and corresponding compound structure), and each of these datasets were restricted to compounds having known annotations (active or inactive) for the evaluated target. For each target—cell line dataset, a first model was built using the gene expression signatures (GES model). Alongside, a second counterpart model was built using the Morgan fingerprints of compounds whose signatures were used in the first model (Morgan FP model).
Figure 2
Figure 2
Exploration of the 2D chemical space, along with the corresponding 2D biological space formed by all GES. (A) t-SNE on Morgan fingerprints from the 9,035 compounds in working dataset, representing the chemical space. Points corresponding to compounds for which there is no known target are represented by gray points (n = 4,163). Points corresponding to compounds for which there is at least one known target are in blue (n = 4,872), with darker blue depending on the increasing number of targets. (B) t-SNE on all GESs in the working dataset, representing the biological (transcriptomic response) space. Points corresponding to GESs are colored by cell line. (C) Biological space highlighting only PC3 and VCAP signatures, 2 cell lines originating from prostate cancer. (D) Biological space highlighting only A549 and HCC515 signatures, 2 cell lines originating from lung cancer.
Figure 3
Figure 3
NR3C1 active and inactives compounds in the chemical space and the different biological spaces formed by GES produced in a single cell line. (A) Chemical space; (B) t-SNE on all A549 signatures (A549 biological space); (C) t-SNE on all MCF7 signatures (MCF7 biological space); (D) t-SNE on all PC3 signatures (PC3 biological space). Points corresponding to NR3C1 actives are red (n = 54), NR3C1 inactives (n = 925) are blue, gray points have no available label concerning NR3C1 activity. Orange circles point out clustering of active compounds.
Figure 4
Figure 4
TUBB active and inactives compounds in the chemical space and the different biological spaces formed by GES produced in a single cell line. (A) Chemical space; (B) A549 biological space; (C) MCF7 biological space; (D) PC3 biological space. Points corresponding to TUBB actives (n = 51) are red, TUBB inactives (n = 697) are blue, gray points have no available label concerning TUBB activity. Orange circles point out clustering of active compounds.
Figure 5
Figure 5
DRD1 active and inactives compounds in the chemical space and the different biological spaces formed by GES produced in a single cell line. (A) Chemical space; (B) A549 biological space; (C) MCF7 biological space; (D) PC3 biological space. Points corresponding to DRD1 actives (n = 99) are red, DRD1 inactives (n = 1843) are blue, gray points have no available label concerning DRD1 activity.
Figure 6
Figure 6
Morgan fingerprints Dice distance vs. GES cosine distance (distance plots). Different panels show information for pairs of NR3C1 (A–C), TUBB (D–F), and DRD1 (G–I); active compounds in A549 (A,D,G), MCF7 (B,E,H), and PC3 (C,F,I) cell lines.

Similar articles

Cited by

References

    1. Aguayo-Orozco A., Bois F. Y., Brunak S., Taboureau O. (2018). Analysis of time-series gene expression data to explore mechanisms of chemical-induced hepatic steatosis toxicity. Front. Genet. 9:396. 10.3389/fgene.2018.00396 - DOI - PMC - PubMed
    1. Aliper A., Plis S., Artemov A., Ulloa A., Mamoshina P., Zhavoronkov A. (2016). Deep learning applications for predicting pharmacological properties of drugs and drug repurposing using transcriptomic data. Mol. Pharm. 13, 2524–2530. 10.1021/acs.molpharmaceut.6b00248 - DOI - PMC - PubMed
    1. Ben-David U., Siranosian B., Ha G., Tang H., Oren Y., Hinohara K., et al. . (2018). Genetic and transcriptional evolution alters cancer cell line drug response. Nature 560, 325–330. 10.1038/s41586-018-0409-3 - DOI - PMC - PubMed
    1. Blasco A., Endres M. G., Sergeev R. A., Jonchhe A., Macaluso N. J. M., Narayan R., et al. . (2019). Advancing computational biology and bioinformatics research through open innovation competitions. PLoS ONE 14:e0222165. 10.1371/journal.pone.0222165 - DOI - PMC - PubMed
    1. Breiman L. (2001). Randomforest2001. Mach. Learn. 45, 5–32. 10.1017/CBO9781107415324.004 - DOI

LinkOut - more resources