A pile of pipelines: An overview of the bioinformatics software for metabarcoding data analyses

Ali Hakimzadeh¹, Alejandro Abdala Asbun², Davide Albanese³, Maria Bernard^{4

5}, Dominik Buchner⁶, Benjamin Callahan⁷, J Gregory Caporaso⁸, Emily Curd⁹, Christophe Djemiel¹⁰, Mikael Brandström Durling¹¹, Vasco Elbrecht⁶, Zachary Gold¹², Hyun S Gweon^{13

14}, Mehrdad Hajibabaei¹⁵, Falk Hildebrand^{16

17}, Vladimir Mikryukov¹, Eric Normandeau¹⁸, Ezgi Özkurt^{16

17}, Jonathan M Palmer¹⁹, Géraldine Pascal²⁰, Teresita M Porter¹⁵, Daniel Straub²¹, Martti Vasar¹, Tomáš Větrovský²², Haris Zafeiropoulos²³, Sten Anslan^{1

24}

Affiliations

¹ Institute of Ecology and Earth Sciences, University of Tartu, Tartu, Estonia.
² Department of Marine Microbiology and Biogeochemistry, NIOZ Royal Netherlands Institute for Sea Research, Texel, Netherlands.
³ Unit of Computational Biology, Research and Innovation Centre, Fondazione Edmund Mach, Italy.
⁴ INRAE, AgroParisTech, GABI, Université Paris-Saclay, Jouy-en-Josas, France.
⁵ INRAE, SIGENAE, Jouy-en-Josas, France.
⁶ Aquatic Ecosystem Research, University of Duisburg-Essen, Essen, Germany.
⁷ Department of Population Health and Pathobiology, College of Veterinary Medicine and Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, USA.
⁸ Center for Applied Microbiome Science, Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, Arizona, USA.
⁹ Vermont Biomedical Research Network, University of Vermont, Burlington, Vermont, USA.
¹⁰ Agroécologie, INRAE, Institut Agro, Univ. Bourgogne Franche-Comté, Dijon, France.
¹¹ Department of Forest Mycology and Plant Pathology, Swedish University of Agricultural Sciences, Uppsala, Sweden.
¹² Zachary Gold, NOAA Pacific Marine Environmental Laboratory, Seattle, Washington, USA.
¹³ UK Centre for Ecology & Hydrology, Oxfordshire, UK.
¹⁴ School of Biological Sciences, University of Reading, Reading, UK.
¹⁵ Department of Integrative Biology and Centre for Biodiversity Genomics, University of Guelph, Guelph, Ontario, Canada.
¹⁶ Gut Microbes & Health, Quadram Institute Bioscience, Norfolk, UK.
¹⁷ Earlham Institute, Norwich Research Park, Norfolk, UK.
¹⁸ Institut de Biologie Intégrative et des Systèmes, Université Laval, Québec, Québec, Canada.
¹⁹ Center for Forest Mycology Research, Northern Research Station, US Forest Service, Madison, Wisconsin, USA.
²⁰ GenPhySE, Université de Toulouse, INRAE, ENVT, Castanet Tolosan, France.
²¹ Quantitative Biology Center (QBiC), University of Tübingen, Tübingen, Germany.
²² Laboratory of Environmental Microbiology, Institute of Microbiology of the Czech Academy of Sciences, Praha, Czech Republic.
²³ KU Leuven, Department of Microbiology, Immunology and Transplantation, Rega Institute for Medical Research, Laboratory of Molecular Bacteriology, Leuven, Belgium.
²⁴ Department of Biological and Environmental Science, University of Jyväskylä, Jyväskylä, Finland.

PMID: 37548515
PMCID: PMC10847385
DOI: 10.1111/1755-0998.13847

Review

A pile of pipelines: An overview of the bioinformatics software for metabarcoding data analyses

Ali Hakimzadeh et al. Mol Ecol Resour. 2024 Jul.

. 2024 Jul;24(5):e13847.

doi: 10.1111/1755-0998.13847. Epub 2023 Aug 7.

Authors

Affiliations

¹ Institute of Ecology and Earth Sciences, University of Tartu, Tartu, Estonia.
² Department of Marine Microbiology and Biogeochemistry, NIOZ Royal Netherlands Institute for Sea Research, Texel, Netherlands.
³ Unit of Computational Biology, Research and Innovation Centre, Fondazione Edmund Mach, Italy.
⁴ INRAE, AgroParisTech, GABI, Université Paris-Saclay, Jouy-en-Josas, France.
⁵ INRAE, SIGENAE, Jouy-en-Josas, France.
⁶ Aquatic Ecosystem Research, University of Duisburg-Essen, Essen, Germany.
⁷ Department of Population Health and Pathobiology, College of Veterinary Medicine and Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, USA.
⁸ Center for Applied Microbiome Science, Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, Arizona, USA.
⁹ Vermont Biomedical Research Network, University of Vermont, Burlington, Vermont, USA.
¹⁰ Agroécologie, INRAE, Institut Agro, Univ. Bourgogne Franche-Comté, Dijon, France.
¹¹ Department of Forest Mycology and Plant Pathology, Swedish University of Agricultural Sciences, Uppsala, Sweden.
¹² Zachary Gold, NOAA Pacific Marine Environmental Laboratory, Seattle, Washington, USA.
¹³ UK Centre for Ecology & Hydrology, Oxfordshire, UK.
¹⁴ School of Biological Sciences, University of Reading, Reading, UK.
¹⁵ Department of Integrative Biology and Centre for Biodiversity Genomics, University of Guelph, Guelph, Ontario, Canada.
¹⁶ Gut Microbes & Health, Quadram Institute Bioscience, Norfolk, UK.
¹⁷ Earlham Institute, Norwich Research Park, Norfolk, UK.
¹⁸ Institut de Biologie Intégrative et des Systèmes, Université Laval, Québec, Québec, Canada.
¹⁹ Center for Forest Mycology Research, Northern Research Station, US Forest Service, Madison, Wisconsin, USA.
²⁰ GenPhySE, Université de Toulouse, INRAE, ENVT, Castanet Tolosan, France.
²¹ Quantitative Biology Center (QBiC), University of Tübingen, Tübingen, Germany.
²² Laboratory of Environmental Microbiology, Institute of Microbiology of the Czech Academy of Sciences, Praha, Czech Republic.
²³ KU Leuven, Department of Microbiology, Immunology and Transplantation, Rega Institute for Medical Research, Laboratory of Molecular Bacteriology, Leuven, Belgium.
²⁴ Department of Biological and Environmental Science, University of Jyväskylä, Jyväskylä, Finland.

PMID: 37548515
PMCID: PMC10847385
DOI: 10.1111/1755-0998.13847

Abstract

Environmental DNA (eDNA) metabarcoding has gained growing attention as a strategy for monitoring biodiversity in ecology. However, taxa identifications produced through metabarcoding require sophisticated processing of high-throughput sequencing data from taxonomically informative DNA barcodes. Various sets of universal and taxon-specific primers have been developed, extending the usability of metabarcoding across archaea, bacteria and eukaryotes. Accordingly, a multitude of metabarcoding data analysis tools and pipelines have also been developed. Often, several developed workflows are designed to process the same amplicon sequencing data, making it somewhat puzzling to choose one among the plethora of existing pipelines. However, each pipeline has its own specific philosophy, strengths and limitations, which should be considered depending on the aims of any specific study, as well as the bioinformatics expertise of the user. In this review, we outline the input data requirements, supported operating systems and particular attributes of thirty-two amplicon processing pipelines with the goal of helping users to select a pipeline for their metabarcoding projects.

Keywords: amplicon data analysis; bioinformatics; environmental DNA; metabarcoding; pipeline; review.

PubMed Disclaimer

Figures

**Figure 1.**
Examples of basic bioinformatics workflows for metabarcoding data. The workflow begins with demultiplexing, assigning reads to respective samples based on unique molecular identifiers. Next, quality filtering removes low-quality reads to reduce errors and improve reliability. Denoising algorithms identify and correct sequencing errors while preserving biological variation. For paired-end reads, merging combines forward and reverse reads into single sequences. Artifacts filtering removes biases introduced by sequencing artifacts like chimeras and NUMTs. Clustering groups sequences into OTUs or ASVs based on similarity, reducing data complexity. Finally, taxonomic assignment is performed using reference databases and algorithms, enabling accurate identification of studied communities. * Primer trimming between any of these steps can be applied. *1 Only for paired-end data. May be performed before or after quality filtering. *2 Error correction; formation of ASVs. *3 Including chimera filtering, off-target gene removal (pseudogene removal, ITS extraction). *4 Formation of OTUs/swarm-clusters.

**Figure 2.**
Software for metabarcoding data bioinformatics processing categorized by input read type (paired-end, single-end (the tools in electric blue are capable of handling both paired-end and single-end reads)), software type (suite, pre-compiled pipeline), interface (CLI, GUI, Web, Galaxy web platform), produced feature type (OTU, ASV, swarm-cluster), and operating system (Linux, macOS, Windows).

See this image and copyright information in PMC

References

1. Albanese D, Fontana P, De Filippo C, Cavalieri D, & Donati C (2015). MICCA: a complete and accurate software for taxonomic profiling of metagenomic data. Scientific Reports, 5(1), 1–7. 10.1038/srep09743 - DOI - PMC - PubMed
1. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, & Lipman DJ (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic acids research, 25(17), 3389–3402. 10.1093/nar/25.17.3389 - DOI - PMC - PubMed
1. Andújar C, Creedy TJ, Arribas P, López H, Salces-Castellano A, Pérez-Delgado AJ, Vogler AP, & Emerson BC (2021). Validated removal of nuclear pseudogenes and sequencing artefacts from mitochondrial metabarcode data. Molecular Ecology Resources, 21(6), 1772–1787. 10.1111/1755-0998.13337 - DOI - PubMed
1. Anslan S, & Tedersoo L (2015). Performance of cytochrome c oxidase subunit I (COI), ribosomal DNA Large Subunit (LSU) and Internal Transcribed Spacer 2 (ITS2) in DNA barcoding of Collembola. European Journal of Soil Biology, 69, 1–7. 10.1016/j.ejsobi.2015.04.001 - DOI
1. Anslan S, Bahram M, Hiiesalu I, & Tedersoo L (2017). PipeCraft: Flexible open-source toolkit for bioinformatics analysis of custom high-throughput amplicon sequencing data. Molecular Ecology Resources, 17(6), e234–e240. 10.1111/1755-0998.12692 - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A pile of pipelines: An overview of the bioinformatics software for metabarcoding data analyses

Affiliations

A pile of pipelines: An overview of the bioinformatics software for metabarcoding data analyses

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources