Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jan 31;20(1):e1011809.
doi: 10.1371/journal.pcbi.1011809. eCollection 2024 Jan.

Statistical integration of multi-omics and drug screening data from cell lines

Affiliations

Statistical integration of multi-omics and drug screening data from cell lines

Said El Bouhaddani et al. PLoS Comput Biol. .

Abstract

Data integration methods are used to obtain a unified summary of multiple datasets. For multi-modal data, we propose a computational workflow to jointly analyze datasets from cell lines. The workflow comprises a novel probabilistic data integration method, named POPLS-DA, for multi-omics data. The workflow is motivated by a study on synucleinopathies where transcriptomics, proteomics, and drug screening data are measured in affected LUHMES cell lines and controls. The aim is to highlight potentially druggable pathways and genes involved in synucleinopathies. First, POPLS-DA is used to prioritize genes and proteins that best distinguish cases and controls. For these genes, an integrated interaction network is constructed where the drug screen data is incorporated to highlight druggable genes and pathways in the network. Finally, functional enrichment analyses are performed to identify clusters of synaptic and lysosome-related genes and proteins targeted by the protective drugs. POPLS-DA is compared to other single- and multi-omics approaches. We found that HSPA5, a member of the heat shock protein 70 family, was one of the most targeted genes by the validated drugs, in particular by AT1-blockers. HSPA5 and AT1-blockers have been previously linked to α-synuclein pathology and Parkinson's disease, showing the relevance of our findings. Our computational workflow identified new directions for therapeutic targets for synucleinopathies. POPLS-DA provided a larger interpretable gene set than other single- and multi-omic approaches. An implementation based on R and markdown is freely available online.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Workflow of the multi-omics integration approach.
(1) To link the two datasets, the transcripts and proteins were mapped to their Entrez identifier, referred to as genes/proteins. (2) To identify the relevant genes/proteins that discriminate between the two experimental groups (α-synuclein overexpression vs. control), POPLS-DA was applied. (3) The prioritized genes/proteins were integrated with the FDA-approved drug screening data using a direct neighbor approach. As an illustration, in the figure under “Direct Neighbor approach” are two drug target genes (green circles), which are neighbors of genes identified in (2) (blue circles). (4) Bioinformatics analyses: interactome, GO, and DisGeNet [13] enrichment analyses analyses. Protein-protein interaction networks were built using String-DB [14] where each node is a gene. The genes that were direct neighbors of a drug target were highlighted with a green halo, where its color intensity is proportional to the number of direct neighbors. Abbreviations: aSyn: α-synuclein, GO: gene ontology, FDA: Food and Drug Administration.
Fig 2
Fig 2. Selection of relevant genes/proteins and their ability to discriminate.
The left panel shows the sorted squared effect size per gene given by the squared elements of . The 200 relevant genes/proteins correspond to the ‘elbow’, which is visually determined and approximately where the curve crosses the green vertical line. The right panel shows boxplots of score predictions based on the selected 200 relevant genes/proteins. A positive resp. negative prediction on the y-axis corresponds to a case resp. control. The dots, representing individual samples, are added with a horizontal ‘jitter’ to reduce overlap. The transcriptomics and proteomics samples are colored orange and blue, respectively. Abbreviations: aSyn: α-synuclein case group, mRNA: transcriptome, Prot: proteome.
Fig 3
Fig 3. String-DB and clustering analyses of the top 200 genes/proteins.
Figures are numbered from left to right, from top to bottom. Enlarged copies of these figures are available in S1 Supporting information. In panel (a), a network of interactions between the top 200 genes/proteins (estimated with POPLS-DA) was constructed using String-DB. Each node is a gene, and a connection between genes indicates evidence for a biologically plausible link. Text mining was excluded as an evidence source, and a medium confidence threshold was used. For genes that were (indirectly) targeted by a drug compound, a green ‘halo’ is drawn. The intensity of the green color is proportional to the number of drug compounds for which the gene was an (indirect) target. In panel (b), the interaction network from (a) was clustered using the MCL clustering algorithm from the String-DB website. The edges between the clusters are removed for visual aid. In panel (c), an interaction network is shown for a druggable subset of the top 200 genes/proteins, consisting of 116 genes that were (indirectly) targeted by an FDA-approved drug compound. In panel (d), an interaction network of the top genes in the “Parkinson’s disease” DisGeNet term was constructed using String-DB. Text mining as evidence was included here.
Fig 4
Fig 4. Workflow of FDA-approved drug screening and dose-response validation.
1,600 FDA-approved drugs were screened in two different concentrations (3 μM, 10 μM) in three runs each in an αSyn toxicity cell model. 53 compounds were identified as being protective in at least one run (A). Timeline of dose-response testing (B). In the first round, concentrations from 0.6 nM to 20 μM of each compound were investigated. 36 compounds could be confirmed as being protective. Of these, 20 already reached their maximal protective potential and were selected not to be tested in a higher concentration. The remaining 16 compounds were tested again in concentrations between 2.4 and 80 μM. All of these were confirmed as being protective in at least one concentration. Of the 17 compounds that were not protective in the first round of dose-response testing, 8 were already toxic and considered not protective. The remaining 9 were tested again in concentration from 2.4 to 80 μM. 5 of these compounds were protective in higher concentrations. In total, 41 could be confirmed as protective against αSyn induced toxicity (C).

References

    1. Höllerhage M, Stepath M, Kohl M, Pfeiffer K, Chua OWH, Duan L, et al.. Transcriptome and Proteome Analysis in LUHMES Cells Overexpressing Alpha-Synuclein. Front Neurol. 2022;13(April):1–19. doi: 10.3389/fneur.2022.787059 - DOI - PMC - PubMed
    1. Ulfenborg B. Vertical and horizontal integration of multi-omics data with miodin. BMC Bioinformatics. 2019;20:649. doi: 10.1186/s12859-019-3224-4 - DOI - PMC - PubMed
    1. Jaiswal A, Gautam P, Pietilä EA, Timonen S, Nordström N, Akimov Y, et al.. Multi-modal meta-analysis of cancer cell line omics profiles identifies ECHDC1 as a novel breast tumor suppressor. Molecular Systems Biology. 2021;17:e9526. doi: 10.15252/msb.20209526 - DOI - PMC - PubMed
    1. Rohart F, Eslami A, Matigian N, Bougeard S, Lê Cao KA. MINT: A multivariate integrative method to identify reproducible molecular signatures across independent experiments and platforms. BMC Bioinformatics. 2017;18(1):128. doi: 10.1186/s12859-017-1553-8 - DOI - PMC - PubMed
    1. Ståhle L, Wold S. Partial least squares analysis with cross-validation for the two-class problem: A Monte Carlo study. J Chemom. 1987;1(3):185–196. doi: 10.1002/cem.1180010306 - DOI