Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jun 15;31(12):i35-43.
doi: 10.1093/bioinformatics/btv231.

Reconstructing 16S rRNA genes in metagenomic data

Affiliations

Reconstructing 16S rRNA genes in metagenomic data

Cheng Yuan et al. Bioinformatics. .

Abstract

Metagenomic data, which contains sequenced DNA reads of uncultured microbial species from environmental samples, provide a unique opportunity to thoroughly analyze microbial species that have never been identified before. Reconstructing 16S ribosomal RNA, a phylogenetic marker gene, is usually required to analyze the composition of the metagenomic data. However, massive volume of dataset, high sequence similarity between related species, skewed microbial abundance and lack of reference genes make 16S rRNA reconstruction difficult. Generic de novo assembly tools are not optimized for assembling 16S rRNA genes. In this work, we introduce a targeted rRNA assembly tool, REAGO (REconstruct 16S ribosomal RNA Genes from metagenOmic data). It addresses the above challenges by combining secondary structure-aware homology search, zproperties of rRNA genes and de novo assembly. Our experimental results show that our tool can correctly recover more rRNA genes than several popular generic metagenomic assembly tools and specially designed rRNA construction tools.

Availability and implementation: The source code of REAGO is freely available at https://github.com/chengyuan/reago.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Pipeline of the 16S rRNA gene assembly. Short black and gray bars represent reads originating from different 16S rRNA genes. Short white bars represent reads from non-16S regions. Long bars represent contigs assembled from short reads
Fig. 2.
Fig. 2.
Graph reduction is conducted iteratively until there is no change on the graph
Fig. 3.
Fig. 3.
Two types of bifurcation where error correction is applied. (A) Multiple vertices sharing the same successor. (B) Multiple vertices sharing the same predecessor
Fig. 4.
Fig. 4.
An example of error correction (applied on V2 and V3). The sequence represented by each node is given beside the node. (A) Ungapped alignment of reads from bifurcating vertices. (B) Mutate rare bases. (C) Remove bifurcation
Fig. 5.
Fig. 5.
The detailed calculation of the probability that a contig C originated from a genus Gi
Fig. 6.
Fig. 6.
Path finding using paired-end information. Solid lines represent overlaps between nodes and dashed lines represent the existence of paired-end reads. The numbers beside dashed lines are the numbers of paired end reads between the corresponding nodes
Fig. 7.
Fig. 7.
Calculate the score between two segments. Arcs indicate existence of paired-end reads between vertices. Thickness of arcs indicate weight of the paired-end match. Actual weights are labeled beside each arc. Assuming there is only one mate-pair among contigs, the WPEMS is 22

References

    1. Altschul S.F., et al. (1990) Basic local alignment search tool. J. Mol. Biol., 215, 403–410. - PubMed
    1. Benson D.A., et al. (2010) GenBank. Nucleic Acids Res., 38, D46–D51. - PMC - PubMed
    1. Berg R.D. (1996) The indigenous gastrointestinal microflora. Trends Microbiol., 4, 430–435. - PubMed
    1. Butler J., et al. (2008) ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res., 18, 810–820. - PMC - PubMed
    1. Christen R. (2008) Global sequencing: a review of current molecular data and new methods available to assess microbial diversity. Microbes Environ. JSME, 23, 253–268. - PubMed

Publication types