An overview of topic modeling and its current applications in bioinformatics
- PMID: 27652181
- PMCID: PMC5028368
- DOI: 10.1186/s40064-016-3252-8
An overview of topic modeling and its current applications in bioinformatics
Abstract
Background: With the rapid accumulation of biological datasets, machine learning methods designed to automate data analysis are urgently needed. In recent years, so-called topic models that originated from the field of natural language processing have been receiving much attention in bioinformatics because of their interpretability. Our aim was to review the application and development of topic models for bioinformatics.
Description: This paper starts with the description of a topic model, with a focus on the understanding of topic modeling. A general outline is provided on how to build an application in a topic model and how to develop a topic model. Meanwhile, the literature on application of topic models to biological data was searched and analyzed in depth. According to the types of models and the analogy between the concept of document-topic-word and a biological object (as well as the tasks of a topic model), we categorized the related studies and provided an outlook on the use of topic models for the development of bioinformatics applications.
Conclusion: Topic modeling is a useful method (in contrast to the traditional means of data reduction in bioinformatics) and enhances researchers' ability to interpret biological information. Nevertheless, due to the lack of topic models optimized for specific biological data, the studies on topic modeling in biological data still have a long and challenging road ahead. We believe that topic models are a promising method for various applications in bioinformatics research.
Keywords: Bioinformatics; Classification; Clustering; Probabilistic generative model; Topic model.
Figures
Similar articles
-
Graph Neural Networks and Their Current Applications in Bioinformatics.Front Genet. 2021 Jul 29;12:690049. doi: 10.3389/fgene.2021.690049. eCollection 2021. Front Genet. 2021. PMID: 34394185 Free PMC article.
-
Investigating the Efficient Use of Word Embedding with Neural-Topic Models for Interpretable Topics from Short Texts.Sensors (Basel). 2022 Jan 23;22(3):852. doi: 10.3390/s22030852. Sensors (Basel). 2022. PMID: 35161598 Free PMC article.
-
Analysis of Persian Bioinformatics Research with Topic Modeling.Biomed Res Int. 2023 Apr 17;2023:3728131. doi: 10.1155/2023/3728131. eCollection 2023. Biomed Res Int. 2023. PMID: 37101687 Free PMC article.
-
Leveraging transformers-based language models in proteome bioinformatics.Proteomics. 2023 Dec;23(23-24):e2300011. doi: 10.1002/pmic.202300011. Epub 2023 Jun 29. Proteomics. 2023. PMID: 37381841 Review.
-
Data-driven modeling and prediction of blood glucose dynamics: Machine learning applications in type 1 diabetes.Artif Intell Med. 2019 Jul;98:109-134. doi: 10.1016/j.artmed.2019.07.007. Epub 2019 Jul 26. Artif Intell Med. 2019. PMID: 31383477 Review.
Cited by
-
Classifying Pseudogout Using Machine Learning Approaches With Electronic Health Record Data.Arthritis Care Res (Hoboken). 2021 Mar;73(3):442-448. doi: 10.1002/acr.24132. Arthritis Care Res (Hoboken). 2021. PMID: 31910317 Free PMC article.
-
AI-powered topic modeling: comparing LDA and BERTopic in analyzing opioid-related cardiovascular risks in women.Exp Biol Med (Maywood). 2025 Feb 28;250:10389. doi: 10.3389/ebm.2025.10389. eCollection 2025. Exp Biol Med (Maywood). 2025. PMID: 40093658 Free PMC article.
-
Tracing the evolution of green logistics: A latent dirichlet allocation based topic modeling technology and roadmapping.PLoS One. 2023 Aug 16;18(8):e0290074. doi: 10.1371/journal.pone.0290074. eCollection 2023. PLoS One. 2023. PMID: 37585422 Free PMC article.
-
Perspectives of the COVID-19 Pandemic on Reddit: Comparative Natural Language Processing Study of the United States, the United Kingdom, Canada, and Australia.JMIR Infodemiology. 2022 Sep 27;2(2):e36941. doi: 10.2196/36941. eCollection 2022 Jul-Dec. JMIR Infodemiology. 2022. PMID: 36196144 Free PMC article.
-
Measuring the composition of the tumor microenvironment with transcriptome analysis: past, present and future.Future Oncol. 2024;20(17):1207-1220. doi: 10.2217/fon-2023-0658. Epub 2024 Feb 16. Future Oncol. 2024. PMID: 38362731 Free PMC article. Review.
References
-
- Andrzejewski D (2006) Modeling protein–protein interactions in biomedical abstracts with latent dirichlet allocation. CS 838-Final Project
-
- Bakalov A, McCallum A, Wallach H, Mimno D (2012) Topic models for taxonomies. In: Proceedings of the 12th ACM/IEEE-CS joint conference on digital libraries, pp 237–240
-
- Bicego M, Lovato P, Ferrarini A, Delledonne M (2010a) Biclustering of expression microarray data with topic models. In: 2010 International conference on pattern recognition, pp 2728–2731
-
- Bicego M, Lovato P, Oliboni B, Perina A (2010b) Expression microarray classification using topic models. In: ACM symposium on applied computing, pp 1516–1520
Publication types
LinkOut - more resources
Full Text Sources
Other Literature Sources