Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2016 Sep 20;5(1):1608.
doi: 10.1186/s40064-016-3252-8. eCollection 2016.

An overview of topic modeling and its current applications in bioinformatics

Affiliations
Review

An overview of topic modeling and its current applications in bioinformatics

Lin Liu et al. Springerplus. .

Abstract

Background: With the rapid accumulation of biological datasets, machine learning methods designed to automate data analysis are urgently needed. In recent years, so-called topic models that originated from the field of natural language processing have been receiving much attention in bioinformatics because of their interpretability. Our aim was to review the application and development of topic models for bioinformatics.

Description: This paper starts with the description of a topic model, with a focus on the understanding of topic modeling. A general outline is provided on how to build an application in a topic model and how to develop a topic model. Meanwhile, the literature on application of topic models to biological data was searched and analyzed in depth. According to the types of models and the analogy between the concept of document-topic-word and a biological object (as well as the tasks of a topic model), we categorized the related studies and provided an outlook on the use of topic models for the development of bioinformatics applications.

Conclusion: Topic modeling is a useful method (in contrast to the traditional means of data reduction in bioinformatics) and enhances researchers' ability to interpret biological information. Nevertheless, due to the lack of topic models optimized for specific biological data, the studies on topic modeling in biological data still have a long and challenging road ahead. We believe that topic models are a promising method for various applications in bioinformatics research.

Keywords: Bioinformatics; Classification; Clustering; Probabilistic generative model; Topic model.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
The diagram of topic modeling
Fig. 2
Fig. 2
The topic distribution of a document
Fig. 3
Fig. 3
The graphical model of PLSA
Fig. 4
Fig. 4
The graphical model of LDA
Fig. 5
Fig. 5
The tasks of a topic model in bioinformatics

Similar articles

Cited by

References

    1. Andrzejewski D (2006) Modeling protein–protein interactions in biomedical abstracts with latent dirichlet allocation. CS 838-Final Project
    1. Arnold CW, El-Saden SM, Bui AA, Taira R (2010) Clinical case-based retrieval using latent topic analysis. In: AMIA annual symposium proceedings, 26 pp - PMC - PubMed
    1. Bakalov A, McCallum A, Wallach H, Mimno D (2012) Topic models for taxonomies. In: Proceedings of the 12th ACM/IEEE-CS joint conference on digital libraries, pp 237–240
    1. Bicego M, Lovato P, Ferrarini A, Delledonne M (2010a) Biclustering of expression microarray data with topic models. In: 2010 International conference on pattern recognition, pp 2728–2731
    1. Bicego M, Lovato P, Oliboni B, Perina A (2010b) Expression microarray classification using topic models. In: ACM symposium on applied computing, pp 1516–1520