Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015;16 Suppl 5(Suppl 5):S2.
doi: 10.1186/1471-2105-16-S5-S2. Epub 2015 Mar 18.

Exploiting topic modeling to boost metagenomic reads binning

Exploiting topic modeling to boost metagenomic reads binning

Ruichang Zhang et al. BMC Bioinformatics. 2015.

Abstract

Background: With the rapid development of high-throughput technologies, researchers can sequence the whole metagenome of a microbial community sampled directly from the environment. The assignment of these metagenomic reads into different species or taxonomical classes is a vital step for metagenomic analysis, which is referred to as binning of metagenomic data.

Results: In this paper, we propose a new method TM-MCluster for binning metagenomic reads. First, we represent each metagenomic read as a set of "k-mers" with their frequencies occurring in the read. Then, we employ a probabilistic topic model -- the Latent Dirichlet Allocation (LDA) model to the reads, which generates a number of hidden "topics" such that each read can be represented by a distribution vector of the generated topics. Finally, as in the MCluster method, we apply SKWIC -- a variant of the classical K-means algorithm with automatic feature weighting mechanism to cluster these reads represented by topic distributions.

Conclusions: Experiments show that the new method TM-MCluster outperforms major existing methods, including AbundanceBin, MetaCluster 3.0/5.0 and MCluster. This result indicates that the exploitation of topic modeling can effectively improve the binning performance of metagenomic reads.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The pipeline of the TM-MCluster method.
Figure 2
Figure 2
The LDA model.
Figure 3
Figure 3
Applying the LDA model to metagenomic reads.
Figure 4
Figure 4
The taxonomy of species in R1.
Figure 5
Figure 5
The effect of topic number on binning performance of TM-MCluster.

Similar articles

Cited by

References

    1. Qin J, Li R, Raes J, Arumugam M, Burgdorf KS, Manichanh C, Nielsen T, Pons N, Levenez F, Yamada T. et al.A human gut microbial gene catalogue established by metagenomic sequencing. Nature. 2010;464(7285):59–65. doi: 10.1038/nature08821. - DOI - PMC - PubMed
    1. Khachatryan ZA, Ktsoyan ZA, Manukyan GP, Kelly D, Ghazaryan KA, Aminov RI. Predominant role of host genetics in controlling the composition of gut microbiota. PloS One. 2008;3(8):3064. doi: 10.1371/journal.pone.0003064. - DOI - PMC - PubMed
    1. Mavromatis K, Ivanova N, Barry K, Shapiro H, Goltsman E, McHardy AC, Rigoutsos I, Salamov A, Korzeniewski F, Land M. et al.Use of simulated data sets to evaluate the fidelity of metagenomics processing methods. Nature Methods. 2007;4(6):495–500. doi: 10.1038/nmeth1043. - DOI - PubMed
    1. Huson DH, Richter DC, Mitra S, Auch AF, Schuster SC. Methods for comparative metagenomics. BMC Bioinformatics. 2009;10(Suppl 1):12. doi: 10.1186/1471-2105-10-S1-S12. - DOI - PMC - PubMed
    1. McHardy AC, Martin HG, Tsirigos A, Hugenholtz P, Rigoutsos I. Accurate phylogenetic classification of variable-length dna fragments. Nature Methods. 2006;4(1):63–72. - PubMed

Publication types

LinkOut - more resources