Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Dec 2;15(12):e0243241.
doi: 10.1371/journal.pone.0243241. eCollection 2020.

CoMA - an intuitive and user-friendly pipeline for amplicon-sequencing data analysis

Affiliations

CoMA - an intuitive and user-friendly pipeline for amplicon-sequencing data analysis

Sebastian Hupfauf et al. PLoS One. .

Abstract

In recent years, there has been a veritable boost in next-generation sequencing (NGS) of gene amplicons in biological and medical studies. Huge amounts of data are produced and need to be analyzed adequately. Various online and offline analysis tools are available; however, most of them require extensive expertise in computer science or bioinformatics, and often a Linux-based operating system. Here, we introduce "CoMA-Comparative Microbiome Analysis" as a free and intuitive analysis pipeline for amplicon-sequencing data, compatible with any common operating system. Moreover, the tool offers various useful services including data pre-processing, quality checking, clustering to operational taxonomic units (OTUs), taxonomic assignment, data post-processing, data visualization, and statistical appraisal. The workflow results in highly esthetic and publication-ready graphics, as well as output files in standardized formats (e.g. tab-delimited OTU-table, BIOM, NEWICK tree) that can be used for more sophisticated analyses. The CoMA output was validated by a benchmark test, using three mock communities with different sample characteristics (primer set, amplicon length, diversity). The performance was compared with that of Mothur, QIIME and QIIME2-DADA2, popular packages for NGS data analysis. Furthermore, the functionality of CoMA is demonstrated on a practical example, investigating microbial communities from three different soils (grassland, forest, swamp). All tools performed well in the benchmark test and were able to reveal the majority of all genera in the mock communities. Also for the soil samples, the results of CoMA were congruent to those of the other pipelines, in particular when looking at the key microbial players.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Overview of the CoMA pipeline workflow.
Different colors represent the four sub-sections of the CoMA workflow: Data pre-processing and quality checking (orange), clustering of operational taxonomic units (OTUs) and taxonomic assignment (green), data post-processing (blue) and data visualization and statistical appraisal (yellow). Labelled arrows demonstrate the order of events and name specific file types that are needed as input for each step. Taxonomic assignment is done with Blast, Lambda or RDP using either one of the available databases (e.g. Silva [23]) or any custom database provided by the user. Numbers indicate third party tools that are used for the specific CoMA step: 1 = PANDAseq, 2 = PRINSEQ, 3 = LotuS/sdm, 4 = QIIME, 5 = Mothur. TDOT = Tab-delimited OTU-table. PER = Paired-end reads. SER = Single-end reads. PCoA = Principal coordinates analysis.
Fig 2
Fig 2. Community composition of the mock-13 dataset, revealed with four different analysis platforms.
The set point (SP) depicts the theoretically expected distribution and serves as reference. The dataset comprised 18 bacterial genera, targeted with 16S rRNA amplicon sequencing.
Fig 3
Fig 3. Community composition of the mock-16 dataset, revealed with four different analysis platforms.
The set point (SP) depicts the theoretically expected distribution and serves as reference. The dataset comprised 46 archaeal and bacterial genera, targeted with 16S rRNA amplicon sequencing.
Fig 4
Fig 4. Community composition of the mock-26 dataset, revealed with four different analysis platforms.
The set point (SP) depicts the theoretically expected distribution and serves as reference. The dataset comprised 11 fungal genera, targeted with ITS amplicon sequencing.
Fig 5
Fig 5. Shannon-Wiener diversity (H’) of the three different soils after sequencing data analysis with CoMA, Mothur and QIIME.
Four replicates for each habitat are shown. Letters indicate significant differences across the analysis tools for each habitat. F = forest. GR = grassland. S = swamp.
Fig 6
Fig 6. Principal component analysis based on archaeal and bacterial families of soil samples from three different habitats: Forest, grassland and swamp.
The color code indicates the applied data analysis tool: CoMA, Mothur and QIIME. Q1—Q4 = quadrants of the coordinate system.
Fig 7
Fig 7. Venn plots showing the shared phyla, classes, orders, families and genera found with CoMA, Mothur and QIIME in the soil samples.
Data include all of the three investigated habitats (forest, grassland, swamp).

Similar articles

Cited by

References

    1. Cole EJ, Zandvakili OR, Blanchard J, Xing B, Hashemi M, Etemadi F. Investigating responses of soil bacterial community composition to hardwood biochar amendment using high-throughput PCR sequencing. Appl. Soil Ecol. 2019; 136: 80–85. 10.1016/j.apsoil.2018.12.010 - DOI
    1. Zamyadi A, Romanis C, Mills T, Neilan B, Choo F, Coral LA, et al. Diagnosing water treatment critical control points for cyanobacterial removal: Exploring benefits of combined microscopy, next-generation sequencing, and cell integrity methods. Water Res. 2019; 152: 96–105. 10.1016/j.watres.2019.01.002 - DOI - PubMed
    1. Jung SW, Kim HJ, Park JS, Lee T-K, Shin K, Jeong S-Y, et al. Planktonic bivalve larvae identification and quantification in Gomso Bay, South Korea, using next-generation sequencing analysis and microscopic observations. Aquaculture. 2018; 490: 297–302. 10.1016/j.aquaculture.2018.02.053 - DOI
    1. Parlapani F, Michailidou S, Anagnostopoulos D, Sakellariou A, Pasentsis K, Psomopoulos F, et al. Microbial spoilage investigation of thawed common cuttlefish (Sepia officinalis) stored at 2° C using next generation sequencing and volatilome analysis. Food Microbiol. 2018; 76: 518–525. 10.1016/j.fm.2018.08.004 - DOI - PubMed
    1. Hu HL, Guo LY, Wu HL, Feng WY, Chen TM, Liu G. Evaluation of next-generation sequencing for the pathogenic diagnosis of children brain abscesses. J. Infection. 2019; 78: 323–337. 10.1016/j.jinf.2019.01.003 - DOI - PubMed

Publication types