Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 May 19:5:9743.
doi: 10.1038/srep09743.

MICCA: a complete and accurate software for taxonomic profiling of metagenomic data

Affiliations

MICCA: a complete and accurate software for taxonomic profiling of metagenomic data

Davide Albanese et al. Sci Rep. .

Abstract

The introduction of high throughput sequencing technologies has triggered an increase of the number of studies in which the microbiota of environmental and human samples is characterized through the sequencing of selected marker genes. While experimental protocols have undergone a process of standardization that makes them accessible to a large community of scientist, standard and robust data analysis pipelines are still lacking. Here we introduce MICCA, a software pipeline for the processing of amplicon metagenomic datasets that efficiently combines quality filtering, clustering of Operational Taxonomic Units (OTUs), taxonomy assignment and phylogenetic tree inference. MICCA provides accurate results reaching a good compromise among modularity and usability. Moreover, we introduce a de-novo clustering algorithm specifically designed for the inference of Operational Taxonomic Units (OTUs). Tests on real and synthetic datasets shows that thanks to the optimized reads filtering process and to the new clustering algorithm, MICCA provides estimates of the number of OTUs and of other common ecological indices that are more accurate and robust than currently available pipelines. Analysis of public metagenomic datasets shows that the higher consistency of results improves our understanding of the structure of environmental and human associated microbial communities. MICCA is an open source project.

PubMed Disclaimer

Figures

Figure 1
Figure 1. 16S-R dataset: Evaluation of MICCA pipeline performance compared with UPARSE and QIIME.
MICCA was also tested in fast variant (MICCA-FAST). In (a) the rarefaction curves are plotted. The continuous green lines represent the actual values. In (b) the relative abundances of the top 20 ranked OTUs compared to the actual values. RSS: Residual Sum of Squares.
Figure 2
Figure 2. Number of OTUs (upper panels) and of distinct genera (lower panels) obtained using MICCA, QIIME and UPARSE for three choices of the of the 16S variable region, namely V1-V3, V3-V5 and V6-V9.
Samples were taken from the HMP, selecting those for which data for the three regions were available.
Figure 3
Figure 3. Diversity indices computed using MICCA, UPARSE and QIIME on the 16S-10 (above) and ITS-10 (below) simulated dataset.
The dashed lines represent the real values.
Figure 4
Figure 4. Curves of the salinity dataset after pooling and rarefaction.
The plot represents the variation of the number of OTUs as a function of salinity in marine water in the Delaware Bay obtained analysing the data using the three pipelines, MICCA, UPARSE and QIIME.
Figure 5
Figure 5
(a) Scatter plot of the number of OTUs identified by the analysis pipeline used in the paper of Dethlefsen et al. for patients, D, E and F vs the number of OTUs estimated by MICCA. (b) Patient E. The microbial growing curve inferred by the two pipelines during the first antibiotic ciprofloxacin (Cp) treatment and the week post Cp (WPC).

References

    1. Schloss P. D. et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol 75, 7537–7541, 10.1128/aem.01541-09 (2009). - DOI - PMC - PubMed
    1. Caporaso J. G. et al. QIIME allows analysis of high-throughput community sequencing data. Nat Methods 7, 335–336, 10.1038/nmeth.f.303 (2010). - DOI - PMC - PubMed
    1. Edgar R. C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26, 2460–2461, 10.1093/bioinformatics/btq461 (2010). - DOI - PubMed
    1. Edgar R. C. UPARSE: highly accurate OTU sequences from microbial amplicon reads. Nat Methods 10, 996–998, 10.1038/nmeth.2604 (2013). - DOI - PubMed
    1. A framework for human microbiome research. . Nature 486, 215–221, 10.1038/nature11209 (2012). - DOI - PMC - PubMed

Publication types

LinkOut - more resources