Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 Jul;24(5):e13847.
doi: 10.1111/1755-0998.13847. Epub 2023 Aug 7.

A pile of pipelines: An overview of the bioinformatics software for metabarcoding data analyses

Affiliations
Review

A pile of pipelines: An overview of the bioinformatics software for metabarcoding data analyses

Ali Hakimzadeh et al. Mol Ecol Resour. 2024 Jul.

Abstract

Environmental DNA (eDNA) metabarcoding has gained growing attention as a strategy for monitoring biodiversity in ecology. However, taxa identifications produced through metabarcoding require sophisticated processing of high-throughput sequencing data from taxonomically informative DNA barcodes. Various sets of universal and taxon-specific primers have been developed, extending the usability of metabarcoding across archaea, bacteria and eukaryotes. Accordingly, a multitude of metabarcoding data analysis tools and pipelines have also been developed. Often, several developed workflows are designed to process the same amplicon sequencing data, making it somewhat puzzling to choose one among the plethora of existing pipelines. However, each pipeline has its own specific philosophy, strengths and limitations, which should be considered depending on the aims of any specific study, as well as the bioinformatics expertise of the user. In this review, we outline the input data requirements, supported operating systems and particular attributes of thirty-two amplicon processing pipelines with the goal of helping users to select a pipeline for their metabarcoding projects.

Keywords: amplicon data analysis; bioinformatics; environmental DNA; metabarcoding; pipeline; review.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Examples of basic bioinformatics workflows for metabarcoding data. The workflow begins with demultiplexing, assigning reads to respective samples based on unique molecular identifiers. Next, quality filtering removes low-quality reads to reduce errors and improve reliability. Denoising algorithms identify and correct sequencing errors while preserving biological variation. For paired-end reads, merging combines forward and reverse reads into single sequences. Artifacts filtering removes biases introduced by sequencing artifacts like chimeras and NUMTs. Clustering groups sequences into OTUs or ASVs based on similarity, reducing data complexity. Finally, taxonomic assignment is performed using reference databases and algorithms, enabling accurate identification of studied communities. * Primer trimming between any of these steps can be applied. *1 Only for paired-end data. May be performed before or after quality filtering. *2 Error correction; formation of ASVs. *3 Including chimera filtering, off-target gene removal (pseudogene removal, ITS extraction). *4 Formation of OTUs/swarm-clusters.
Figure 2.
Figure 2.
Software for metabarcoding data bioinformatics processing categorized by input read type (paired-end, single-end (the tools in electric blue are capable of handling both paired-end and single-end reads)), software type (suite, pre-compiled pipeline), interface (CLI, GUI, Web, Galaxy web platform), produced feature type (OTU, ASV, swarm-cluster), and operating system (Linux, macOS, Windows).

References

    1. Albanese D, Fontana P, De Filippo C, Cavalieri D, & Donati C (2015). MICCA: a complete and accurate software for taxonomic profiling of metagenomic data. Scientific Reports, 5(1), 1–7. 10.1038/srep09743 - DOI - PMC - PubMed
    1. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, & Lipman DJ (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic acids research, 25(17), 3389–3402. 10.1093/nar/25.17.3389 - DOI - PMC - PubMed
    1. Andújar C, Creedy TJ, Arribas P, López H, Salces-Castellano A, Pérez-Delgado AJ, Vogler AP, & Emerson BC (2021). Validated removal of nuclear pseudogenes and sequencing artefacts from mitochondrial metabarcode data. Molecular Ecology Resources, 21(6), 1772–1787. 10.1111/1755-0998.13337 - DOI - PubMed
    1. Anslan S, & Tedersoo L (2015). Performance of cytochrome c oxidase subunit I (COI), ribosomal DNA Large Subunit (LSU) and Internal Transcribed Spacer 2 (ITS2) in DNA barcoding of Collembola. European Journal of Soil Biology, 69, 1–7. 10.1016/j.ejsobi.2015.04.001 - DOI
    1. Anslan S, Bahram M, Hiiesalu I, & Tedersoo L (2017). PipeCraft: Flexible open-source toolkit for bioinformatics analysis of custom high-throughput amplicon sequencing data. Molecular Ecology Resources, 17(6), e234–e240. 10.1111/1755-0998.12692 - DOI - PubMed

Substances

LinkOut - more resources