Transforming L1000 profiles to RNA-seq-like profiles with deep learning
- PMID: 36100892
- PMCID: PMC9472394
- DOI: 10.1186/s12859-022-04895-5
Transforming L1000 profiles to RNA-seq-like profiles with deep learning
Abstract
The L1000 technology, a cost-effective high-throughput transcriptomics technology, has been applied to profile a collection of human cell lines for their gene expression response to > 30,000 chemical and genetic perturbations. In total, there are currently over 3 million available L1000 profiles. Such a dataset is invaluable for the discovery of drug and target candidates and for inferring mechanisms of action for small molecules. The L1000 assay only measures the mRNA expression of 978 landmark genes while 11,350 additional genes are computationally reliably inferred. The lack of full genome coverage limits knowledge discovery for half of the human protein coding genes, and the potential for integration with other transcriptomics profiling data. Here we present a Deep Learning two-step model that transforms L1000 profiles to RNA-seq-like profiles. The input to the model are the measured 978 landmark genes while the output is a vector of 23,614 RNA-seq-like gene expression profiles. The model first transforms the landmark genes into RNA-seq-like 978 gene profiles using a modified CycleGAN model applied to unpaired data. The transformed 978 RNA-seq-like landmark genes are then extrapolated into the full genome space with a fully connected neural network model. The two-step model achieves 0.914 Pearson's correlation coefficients and 1.167 root mean square errors when tested on a published paired L1000/RNA-seq dataset produced by the LINCS and GTEx programs. The processed RNA-seq-like profiles are made available for download, signature search, and gene centric reverse search with unique case studies.
Keywords: Gene expression translation; Generative adversarial networks; L1000; RNA-seq.
© 2022. The Author(s).
Conflict of interest statement
The authors declare no competing interests.
Figures








Similar articles
-
Mining influential genes based on deep learning.BMC Bioinformatics. 2021 Jan 22;22(1):27. doi: 10.1186/s12859-021-03972-5. BMC Bioinformatics. 2021. PMID: 33482718 Free PMC article.
-
L1000CDS2: LINCS L1000 characteristic direction signatures search engine.NPJ Syst Biol Appl. 2016;2:16015-. doi: 10.1038/npjsba.2016.15. Epub 2016 Aug 4. NPJ Syst Biol Appl. 2016. PMID: 28413689 Free PMC article.
-
Compound signature detection on LINCS L1000 big data.Mol Biosyst. 2015 Mar;11(3):714-22. doi: 10.1039/c4mb00677a. Epub 2015 Jan 22. Mol Biosyst. 2015. PMID: 25609570 Free PMC article.
-
Navigating Transcriptomic Connectivity Mapping Workflows to Link Chemicals with Bioactivities.Chem Res Toxicol. 2022 Nov 21;35(11):1929-1949. doi: 10.1021/acs.chemrestox.2c00245. Epub 2022 Oct 27. Chem Res Toxicol. 2022. PMID: 36301716 Free PMC article. Review.
-
High-Throughput Strategies for the Discovery of Anticancer Drugs by Targeting Transcriptional Reprogramming.Front Oncol. 2021 Oct 1;11:762023. doi: 10.3389/fonc.2021.762023. eCollection 2021. Front Oncol. 2021. PMID: 34660328 Free PMC article. Review.
Cited by
-
Precious2GPT: the combination of multiomics pretrained transformer and conditional diffusion for artificial multi-omics multi-species multi-tissue sample generation.NPJ Aging. 2024 Aug 8;10(1):37. doi: 10.1038/s41514-024-00163-3. NPJ Aging. 2024. PMID: 39117678 Free PMC article.
-
HE2Gene: image-to-RNA translation via multi-task learning for spatial transcriptomics data.Bioinformatics. 2024 Jun 3;40(6):btae343. doi: 10.1093/bioinformatics/btae343. Bioinformatics. 2024. PMID: 38837395 Free PMC article.
-
L2S2: chemical perturbation and CRISPR KO LINCS L1000 signature search engine.Nucleic Acids Res. 2025 Jul 7;53(W1):W338-W350. doi: 10.1093/nar/gkaf373. Nucleic Acids Res. 2025. PMID: 40308216 Free PMC article.
-
Text-mining-based feature selection for anticancer drug response prediction.Bioinform Adv. 2024 Mar 26;4(1):vbae047. doi: 10.1093/bioadv/vbae047. eCollection 2024. Bioinform Adv. 2024. PMID: 38606185 Free PMC article.
-
Gene expression inference based on graph neural networks using L1000 data.Brief Bioinform. 2025 May 1;26(3):bbaf273. doi: 10.1093/bib/bbaf273. Brief Bioinform. 2025. PMID: 40505083 Free PMC article.
References
-
- Zhu J-Y, Park T, Isola P, Efros AA. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: 2017 IEEE International Conference on Computer Vision (ICCV) 2017.
-
- Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial networks. arXiv [statML] 2014.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources