Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Apr 28;21(1):103.
doi: 10.1186/s13059-020-02014-2.

CCMetagen: comprehensive and accurate identification of eukaryotes and prokaryotes in metagenomic data

Affiliations

CCMetagen: comprehensive and accurate identification of eukaryotes and prokaryotes in metagenomic data

Vanessa R Marcelino et al. Genome Biol. .

Abstract

There is an increasing demand for accurate and fast metagenome classifiers that can not only identify bacteria, but all members of a microbial community. We used a recently developed concept in read mapping to develop a highly accurate metagenomic classification pipeline named CCMetagen. The pipeline substantially outperforms other commonly used software in identifying bacteria and fungi and can efficiently use the entire NCBI nucleotide collection as a reference to detect species with incomplete genome data from all biological kingdoms. CCMetagen is user-friendly, and the results can be easily integrated into microbial community analysis software for streamlined and automated microbiome studies.

Keywords: ConClave sorting; Fungi; Metagenomic classifier; Microbiome.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Overview of the ConClave sorting scheme applied to species identification in metagenomic data sets. The figure represents a data set containing 5 sequence reads (4 bp) and two closely related reference sequences (templates), including a true positive (Ref. 1) and a potential false positive (Ref. 2). a Commonly used read mappers yield a high number of false positives because reads can be randomly assigned to closely related reference sequences sharing identical fragments spanning the whole sequence read (represented by the ATATT region). b The KMA aligner minimizes this problem by scoring reference sequences based on all possible mappings of all reads and then choosing the templates with the highest scores. Coupled with KMA, CCMetagen produces highly accurate taxonomic assignments of reads in metagenomic data sets in user-friendly formats
Fig. 2
Fig. 2
The CCMetagen pipeline has a higher F1 score than other metagenomic classification methods for all taxonomic ranks. The two points for each program and taxonomic rank represent the results using a simulated metagenome and a metatranscriptome sample of a fungal community. a Results using the whole NCBI nt collection as a reference database. b Results using the RefSeq-bf (bacteria and fungi) database, containing all bacterial and fungal genomes available. c Partial RefSeq database containing only some of the fungal species currently present in the RefSeq-bf database, mimicking the effects of dealing with species without representatives in reference data sets. In this case, Kraken2, Centrifuge, and KrakenUniq have overlapping results. Refer to Additional file 1: Figures S1 and S2 and Additional file 2 for more information, including precision and recall
Fig. 3
Fig. 3
CCMetagen pipeline performance for bacterial classifications, compared with Kraken2, Centrifuge, and KrakenUniq. Precision (% of true positives), recall (% of taxa identified), and F1 scores represent averages across 10 simulated metagenome samples. Shaded areas indicate 75% confidence intervals
Fig. 4
Fig. 4
Snapshot of CCMetagen results for a spiked fungal community. This Krona graph shows the relative abundance of taxa at various taxonomic levels that are color-coded according to their taxonomic classification at lower-ranks—here, we see fungal taxa in shades of red, and bacterial taxa in shades of green. The Krona html file can be opened and interactively inspected in a web browser. Each circle represents a taxonomic level, where the user can click for a representation of the relative abundance at a given taxonomic rank. For a detailed list of taxa, refer to Additional file 5
Fig. 5
Fig. 5
Microbial families in the microbiome of wild birds. The 20 most abundant families are shown, with fungal families indicated in bold. For a full list of taxa, refer to Additional file 6. A tutorial and R scripts to reproduce these analyses are available on the CCMetagen website

References

    1. Caporaso JG, Lauber CL, Walters WA, Berg-Lyons D, Lozupone CA, Turnbaugh PJ, et al. Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample. Proc Natl Acad Sci U S A. 2011;108(Suppl):4516–4522. doi: 10.1073/pnas.1000080107. - DOI - PMC - PubMed
    1. Taberlet P, Coissac E, Pompanon F, Brochmann C, Willerslev E. Towards next-generation biodiversity assessment using DNA metabarcoding. Mol Ecol. 2012;21:2045–2050. doi: 10.1111/j.1365-294X.2012.05470.x. - DOI - PubMed
    1. Bik HM, Porazinska DL, Creer S, Caporaso JG, Knight R, Thomas WK. Sequencing our way towards understanding global eukaryotic biodiversity. Trends Ecol Evol. 2012;27:233–243. doi: 10.1016/j.tree.2011.11.010. - DOI - PMC - PubMed
    1. Norman JM, Handley SA, Virgin HW. Kingdom-agnostic metagenomics and the importance of complete characterization of enteric microbial communities. Gastroenterology. 2014;146:1459–1469. doi: 10.1053/j.gastro.2014.02.001. - DOI - PMC - PubMed
    1. Marcelino VR, Verbruggen H. Multi-marker metabarcoding of coral skeletons reveals a rich microbiome and diverse evolutionary origins of endolithic algae. Sci Rep. 2016;6:31508. doi: 10.1038/srep31508. - DOI - PMC - PubMed

Publication types