Exploring the Use of Compound-Induced Transcriptomic Data Generated From Cell Lines to Predict Compound Activity Toward Molecular Targets

Benoît Baillif¹, Joerg Wichard², Oscar Méndez-Lucio^{1

3}, David Rouquié¹

Affiliations

¹ Bayer SAS, Bayer CropScience, Sophia Antipolis, France.
² Department of Genetic Toxicology, Bayer AG, Berlin, Germany.
³ Bloomoon, Villeurbanne, France.

PMID: 32391323
PMCID: PMC7191531
DOI: 10.3389/fchem.2020.00296

Exploring the Use of Compound-Induced Transcriptomic Data Generated From Cell Lines to Predict Compound Activity Toward Molecular Targets

Benoît Baillif et al. Front Chem. 2020.

. 2020 Apr 23:8:296.

doi: 10.3389/fchem.2020.00296. eCollection 2020.

Authors

Benoît Baillif¹, Joerg Wichard², Oscar Méndez-Lucio^{1

3}, David Rouquié¹

Affiliations

¹ Bayer SAS, Bayer CropScience, Sophia Antipolis, France.
² Department of Genetic Toxicology, Bayer AG, Berlin, Germany.
³ Bloomoon, Villeurbanne, France.

PMID: 32391323
PMCID: PMC7191531
DOI: 10.3389/fchem.2020.00296

Abstract

Pharmaceutical or phytopharmaceutical molecules rely on the interaction with one or more specific molecular targets to induce their anticipated biological responses. Nonetheless, these compounds are also prone to interact with many other non-intended biological targets, also known as off-targets. Unfortunately, off-target identification is difficult and expensive. Consequently, QSAR models predicting the activity on a target have gained importance in drug discovery or in the de-risking of chemicals. However, a restricted number of targets are well characterized and hold enough data to build such in silico models. A good alternative to individual target evaluations is to use integrative evaluations such as transcriptomics obtained from compound-induced gene expression measurements derived from cell cultures. The advantage of these particular experiments is to capture the consequences of the interaction of compounds on many possible molecular targets and biological pathways, without having any constraints concerning the chemical space. In this work, we assessed the value of a large public dataset of compound-induced transcriptomic data, to predict compound activity on a selection of 69 molecular targets. We compared such descriptors with other QSAR descriptors, namely the Morgan fingerprints (similar to extended-connectivity fingerprints). Depending on the target, active compounds could show similar signatures in one or multiple cell lines, whether these active compounds shared similar or different chemical structures. Random forest models using gene expression signatures were able to perform similarly or better than counterpart models built with Morgan fingerprints for 25% of the target prediction tasks. These performances occurred mostly using signatures produced in cell lines showing similar signatures for active compounds toward the considered target. We show that compound-induced transcriptomic data could represent a great opportunity for target prediction, allowing to overcome the chemical space limitation of QSAR models.

Keywords: QSAR; cellular context; compound-induced transcriptomic data; machine learning; target prediction.

PubMed Disclaimer

Figures

**Figure 1**
Data analysis pipeline performed in current work. Starting from the CMAP L1000 dataset, signatures produced at 10 μM and 24 h from 8 cell lines were extracted and used in t-SNE and distance plots. One dataset was built per cell line (GES and corresponding compound structure), and each of these datasets were restricted to compounds having known annotations (active or inactive) for the evaluated target. For each target—cell line dataset, a first model was built using the gene expression signatures (GES model). Alongside, a second counterpart model was built using the Morgan fingerprints of compounds whose signatures were used in the first model (Morgan FP model).

**Figure 2**
Exploration of the 2D chemical space, along with the corresponding 2D biological space formed by all GES. **(A)** t-SNE on Morgan fingerprints from the 9,035 compounds in working dataset, representing the chemical space. Points corresponding to compounds for which there is no known target are represented by gray points (n = 4,163). Points corresponding to compounds for which there is at least one known target are in blue (n = 4,872), with darker blue depending on the increasing number of targets. **(B)** t-SNE on all GESs in the working dataset, representing the biological (transcriptomic response) space. Points corresponding to GESs are colored by cell line. **(C)** Biological space highlighting only PC3 and VCAP signatures, 2 cell lines originating from prostate cancer. **(D)** Biological space highlighting only A549 and HCC515 signatures, 2 cell lines originating from lung cancer.

**Figure 3**
NR3C1 active and inactives compounds in the chemical space and the different biological spaces formed by GES produced in a single cell line. **(A)** Chemical space; **(B)** t-SNE on all A549 signatures (A549 biological space); **(C)** t-SNE on all MCF7 signatures (MCF7 biological space); **(D)** t-SNE on all PC3 signatures (PC3 biological space). Points corresponding to NR3C1 actives are red (n = 54), NR3C1 inactives (n = 925) are blue, gray points have no available label concerning NR3C1 activity. Orange circles point out clustering of active compounds.

**Figure 4**
TUBB active and inactives compounds in the chemical space and the different biological spaces formed by GES produced in a single cell line. **(A)** Chemical space; **(B)** A549 biological space; **(C)** MCF7 biological space; **(D)** PC3 biological space. Points corresponding to TUBB actives (n = 51) are red, TUBB inactives (n = 697) are blue, gray points have no available label concerning TUBB activity. Orange circles point out clustering of active compounds.

**Figure 5**
DRD1 active and inactives compounds in the chemical space and the different biological spaces formed by GES produced in a single cell line. **(A)** Chemical space; **(B)** A549 biological space; **(C)** MCF7 biological space; **(D)** PC3 biological space. Points corresponding to DRD1 actives (n = 99) are red, DRD1 inactives (n = 1843) are blue, gray points have no available label concerning DRD1 activity.

**Figure 6**
Morgan fingerprints Dice distance vs. GES cosine distance (distance plots). Different panels show information for pairs of NR3C1 **(A–C)**, TUBB **(D–F)**, and DRD1 **(G–I)**; active compounds in A549 **(A,D,G)**, MCF7 **(B,E,H)**, and PC3 **(C,F,I)** cell lines.

See this image and copyright information in PMC

Cited by

Exploration of the DARTable Genome- a Resource Enabling Data-Driven NAMs for Developmental and Reproductive Toxicity Prediction.
Janowska-Sejda EI, Adeleye Y, Currie RA. Janowska-Sejda EI, et al. Front Toxicol. 2022 Jan 19;3:806311. doi: 10.3389/ftox.2021.806311. eCollection 2021. Front Toxicol. 2022. PMID: 35295108 Free PMC article.
Predicting molecular initiating events using chemical target annotations and gene expression.
Bundy JL, Judson R, Williams AJ, Grulke C, Shah I, Everett LJ. Bundy JL, et al. BioData Min. 2022 Mar 4;15(1):7. doi: 10.1186/s13040-022-00292-z. BioData Min. 2022. PMID: 35246223 Free PMC article.
Application of perturbation gene expression profiles in drug discovery-From mechanism of action to quantitative modelling.
Szalai B, Veres DV. Szalai B, et al. Front Syst Biol. 2023 Feb 9;3:1126044. doi: 10.3389/fsysb.2023.1126044. eCollection 2023. Front Syst Biol. 2023. PMID: 40809500 Free PMC article. Review.
Computational analyses of mechanism of action (MoA): data, methods and integration.
Trapotsi MA, Hosseini-Gerami L, Bender A. Trapotsi MA, et al. RSC Chem Biol. 2021 Dec 22;3(2):170-200. doi: 10.1039/d1cb00069a. eCollection 2022 Feb 9. RSC Chem Biol. 2021. PMID: 35360890 Free PMC article. Review.
Protocol for predicting suppressors of cell-death pathways based on transcriptomic and vulnerability data.
Vinik Y, Maimon A, Lev S. Vinik Y, et al. STAR Protoc. 2025 Jun 20;6(2):103855. doi: 10.1016/j.xpro.2025.103855. Epub 2025 May 29. STAR Protoc. 2025. PMID: 40449001 Free PMC article.

See all "Cited by" articles

References

1. Aguayo-Orozco A., Bois F. Y., Brunak S., Taboureau O. (2018). Analysis of time-series gene expression data to explore mechanisms of chemical-induced hepatic steatosis toxicity. Front. Genet. 9:396. 10.3389/fgene.2018.00396 - DOI - PMC - PubMed
1. Aliper A., Plis S., Artemov A., Ulloa A., Mamoshina P., Zhavoronkov A. (2016). Deep learning applications for predicting pharmacological properties of drugs and drug repurposing using transcriptomic data. Mol. Pharm. 13, 2524–2530. 10.1021/acs.molpharmaceut.6b00248 - DOI - PMC - PubMed
1. Ben-David U., Siranosian B., Ha G., Tang H., Oren Y., Hinohara K., et al. . (2018). Genetic and transcriptional evolution alters cancer cell line drug response. Nature 560, 325–330. 10.1038/s41586-018-0409-3 - DOI - PMC - PubMed
1. Blasco A., Endres M. G., Sergeev R. A., Jonchhe A., Macaluso N. J. M., Narayan R., et al. . (2019). Advancing computational biology and bioinformatics research through open innovation competitions. PLoS ONE 14:e0222165. 10.1371/journal.pone.0222165 - DOI - PMC - PubMed
1. Breiman L. (2001). Randomforest2001. Mach. Learn. 45, 5–32. 10.1017/CBO9781107415324.004 - DOI

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Exploring the Use of Compound-Induced Transcriptomic Data Generated From Cell Lines to Predict Compound Activity Toward Molecular Targets

Affiliations

Exploring the Use of Compound-Induced Transcriptomic Data Generated From Cell Lines to Predict Compound Activity Toward Molecular Targets

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources