An exploration into CTEPH medications: Combining natural language processing, embedding learning, in vitro models, and real-world evidence for drug repurposing
- PMID: 39264975
- PMCID: PMC11478854
- DOI: 10.1371/journal.pcbi.1012417
An exploration into CTEPH medications: Combining natural language processing, embedding learning, in vitro models, and real-world evidence for drug repurposing
Abstract
Background: In the modern era, the growth of scientific literature presents a daunting challenge for researchers to keep informed of advancements across multiple disciplines.
Objective: We apply natural language processing (NLP) and embedding learning concepts to design PubDigest, a tool that combs PubMed literature, aiming to pinpoint potential drugs that could be repurposed.
Methods: Using NLP, especially term associations through word embeddings, we explored unrecognized relationships between drugs and diseases. To illustrate the utility of PubDigest, we focused on chronic thromboembolic pulmonary hypertension (CTEPH), a rare disease with an overall limited number of scientific publications.
Results: Our literature analysis identified key clinical features linked to CTEPH by applying term frequency-inverse document frequency (TF-IDF) scoring, a technique measuring a term's significance in a text corpus. This allowed us to map related diseases. One standout was venous thrombosis (VT), which showed strong semantic links with CTEPH. Looking deeper, we discovered potential repurposing candidates for CTEPH through large-scale neural network-based contextualization of literature and predictive modeling on both the CTEPH and the VT literature corpora to find novel, yet unrecognized associations between the two diseases. Alongside the anti-thrombotic agent caplacizumab, benzofuran derivatives were an intriguing find. In particular, the benzofuran derivative amiodarone displayed potential anti-thrombotic properties in the literature. Our in vitro tests confirmed amiodarone's ability to reduce platelet aggregation significantly by 68% (p = 0.02). However, real-world clinical data indicated that CTEPH patients receiving amiodarone treatment faced a significant 15.9% higher mortality risk (p<0.001).
Conclusions: While NLP offers an innovative approach to interpreting scientific literature, especially for drug repurposing, it is crucial to combine it with complementary methods like in vitro testing and real-world evidence. Our exploration with benzofuran derivatives and CTEPH underscores this point. Thus, blending NLP with hands-on experiments and real-world clinical data can pave the way for faster and safer drug repurposing approaches, especially for rare diseases like CTEPH.
Copyright: © 2024 Steiert et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Conflict of interest statement
The authors have declared that no competing interests exist.
Figures






Similar articles
-
A comparison of word embeddings for the biomedical natural language processing.J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12. J Biomed Inform. 2018. PMID: 30217670 Free PMC article.
-
PISTON: Predicting drug indications and side effects using topic modeling and natural language processing.J Biomed Inform. 2018 Nov;87:96-107. doi: 10.1016/j.jbi.2018.09.015. Epub 2018 Sep 27. J Biomed Inform. 2018. PMID: 30268842
-
Text mining-based word representations for biomedical data analysis and protein-protein interaction networks in machine learning tasks.PLoS One. 2021 Oct 15;16(10):e0258623. doi: 10.1371/journal.pone.0258623. eCollection 2021. PLoS One. 2021. PMID: 34653224 Free PMC article.
-
Application of artificial intelligence and machine learning in drug repurposing.Prog Mol Biol Transl Sci. 2024;205:171-211. doi: 10.1016/bs.pmbts.2024.03.030. Epub 2024 Mar 31. Prog Mol Biol Transl Sci. 2024. PMID: 38789178 Review.
-
Molecular biology of chronic thromboembolic pulmonary hypertension.Semin Thorac Cardiovasc Surg. 2006 Fall;18(3):265-76. doi: 10.1053/j.semtcvs.2006.09.004. Semin Thorac Cardiovasc Surg. 2006. PMID: 17185190 Review.
Cited by
-
In Silico Validation of AI-Assisted Drugs in Healthcare.Methods Mol Biol. 2025;2952:445-458. doi: 10.1007/978-1-0716-4690-8_24. Methods Mol Biol. 2025. PMID: 40553347
References
-
- MEDLINE Citation Counts by Year of Publication (as of January 2023) [Internet]. May 27. 2023. Available from: https://www.nlm.nih.gov/bsd/medline_cit_counts_yr_pub.html
MeSH terms
LinkOut - more resources
Full Text Sources
Miscellaneous