Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Feb 3;39(2):btad085.
doi: 10.1093/bioinformatics/btad085.

Multimodal representation learning for predicting molecule-disease relations

Affiliations

Multimodal representation learning for predicting molecule-disease relations

Jun Wen et al. Bioinformatics. .

Abstract

Motivation: Predicting molecule-disease indications and side effects is important for drug development and pharmacovigilance. Comprehensively mining molecule-molecule, molecule-disease and disease-disease semantic dependencies can potentially improve prediction performance.

Methods: We introduce a Multi-Modal REpresentation Mapping Approach to Predicting molecular-disease relations (M2REMAP) by incorporating clinical semantics learned from electronic health records (EHR) of 12.6 million patients. Specifically, M2REMAP first learns a multimodal molecule representation that synthesizes chemical property and clinical semantic information by mapping molecule chemicals via a deep neural network onto the clinical semantic embedding space shared by drugs, diseases and other common clinical concepts. To infer molecule-disease relations, M2REMAP combines multimodal molecule representation and disease semantic embedding to jointly infer indications and side effects.

Results: We extensively evaluate M2REMAP on molecule indications, side effects and interactions. Results show that incorporating EHR embeddings improves performance significantly, for example, attaining an improvement over the baseline models by 23.6% in PRC-AUC on indications and 23.9% on side effects. Further, M2REMAP overcomes the limitation of existing methods and effectively predicts drugs for novel diseases and emerging pathogens.

Availability and implementation: The code is available at https://github.com/celehs/M2REMAP, and prediction results are provided at https://shiny.parse-health.org/drugs-diseases-dev/.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Overview of M2REMAP. By learning clinical semantic embeddings from EHR data, M2REMAP synthesized molecule chemicals and EHR semantics to attain multimodal molecule representation combined with disease EHR semantics to jointly infer indications and side effects
Fig. 2
Fig. 2
Drug–disease embedding visualization. We visualize the EHR semantic embedding of cancer and psychotropic drugs and their reported indications and side effects. (a) drugs and indications; (b) drugs and side effects

References

    1. Allegretti M. et al. (2022) Repurposing the estrogen receptor modulator raloxifene to treat SARS-COV-2 infection. Cell Death Differ., 29, 156–166. - PMC - PubMed
    1. Alves V.M. et al. (2021) QSAR modeling of SARS-CoV Mpro inhibitors identifies sufugolix, cenicriviroc, proglumetacin, and other drugs as candidates for repurposing against SARS-CoV-2. Mol. Inf., 40, 2000113. - PubMed
    1. Beam A.L. et al. (2019) Clinical concept embeddings learned from massive sources of multimodal medical data. In: Pacific Symposium on Biocomputing 2020, Hawaii, pp. 295–306. World Scientific. - PMC - PubMed
    1. Bernstein L.R., Zhang L. (2020) Gallium maltolate has in vitro antiviral activity against SARS-CoV-2 and is a potential treatment for COVID-19. Antivir. Chem. Chemother., 28, 2040206620983780. - PMC - PubMed
    1. Chandak P. et al. (2022) Building a knowledge graph to enable precision medicine. Scientific Data, 10(1), 67. - PMC - PubMed

Publication types

LinkOut - more resources