Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jan 19:10:e12758.
doi: 10.7717/peerj.12758. eCollection 2022.

DnoisE: distance denoising by entropy. An open-source parallelizable alternative for denoising sequence datasets

Affiliations

DnoisE: distance denoising by entropy. An open-source parallelizable alternative for denoising sequence datasets

Adrià Antich et al. PeerJ. .

Abstract

DNA metabarcoding is broadly used in biodiversity studies encompassing a wide range of organisms. Erroneous amplicons, generated during amplification and sequencing procedures, constitute one of the major sources of concern for the interpretation of metabarcoding results. Several denoising programs have been implemented to detect and eliminate these errors. However, almost all denoising software currently available has been designed to process non-coding ribosomal sequences, most notably prokaryotic 16S rDNA. The growing number of metabarcoding studies using coding markers such as COI or RuBisCO demands a re-assessment and calibration of denoising algorithms. Here we present DnoisE, the first denoising program designed to detect erroneous reads and merge them with the correct ones using information from the natural variability (entropy) associated to each codon position in coding barcodes. We have developed an open-source software using a modified version of the UNOISE algorithm. DnoisE implements different merging procedures as options, and can incorporate codon entropy information either retrieved from the data or supplied by the user. In addition, the algorithm of DnoisE is parallelizable, greatly reducing runtimes on computer clusters. Our program also allows different input file formats, so it can be readily incorporated into existing metabarcoding pipelines.

Keywords: Bioinformatic pipelines; Coding markers; Denoising algorithms; Entropy correction; Metabarcoding; Metaphylogeography.

PubMed Disclaimer

Conflict of interest statement

Owen S. Wangensteen is an Academic Editor for PeerJ.

Figures

Figure 1
Figure 1. Scheme of the workflow of DnoisE.
Starting from an abundance-sorted sequence dataset, subsets of possible daughter sequences (PDS) and possible mother sequences (PMS) are selected as detailed in Fig. 2. For each subset, all PDS are compared with all compatible PMS (in terms of MDA and MMA). If the merging inequality is met, the values of the main parameters are stored. After all subsets have been evaluated, for each merging criterion the best PMS for each PDS is chosen and a sequence file is generated, together with a file with information on the merging process.
Figure 2
Figure 2. Schematic workflow of parallel processing of DnoisE.
When running in parallel, comparisons between sequences are computed in sets of sequences defined by their abundances. Using the Maximum Daughter Abundance (MDA) value, computed from the last correct sequence of the previous step, we can define sets of sequences that are compared in parallel with the previously tagged correct sequences.
Figure 3
Figure 3. Time (blue) and memory (red) used by DnoisE to denoise and merge sequences with the Ratio-Distance criterion using different cores on a computer cluster.
Denoising using entropy correction (triangles and dashed line) is compared against no correction (circles and dashed line). Lines are computed using the geom_smooth() function of the ggplot2 package with method = ‘loess’.
Figure 4
Figure 4. Number of original (correct) sequences (red), total sequences (dark blue) and total sequences filtered by read abundance (light blue) retrieved by DnoisE with entropy correction (solid line) and without entropy correction (equivalent to UNOISE).
Values with abundance filtering were computed using a minimum abundance of 10 reads (–min_abund 10).
Figure 5
Figure 5. Match ratio (error sequences merged to their “true” mothers/total number of merged sequences) of DnoisE without entropy correction and abundance ratio joining criterion (equivalent to UNOISE) grey bars) and DnoisE with entropy correction.
For DnoisE with entropy correction the three merging criteria were compared, abundance ratio criterion (orange bars), the genetic distance criterion (blue bars) and the criterion based on the cocient between the abundance ratio and the β(d) (green bars).

References

    1. Amir A, McDonald D, Navas-Molina JA, Kopylova E, Morton JT, Zech Xu Z, Kightley EP, Thompson LR, Hyde ER, Gonzalez A, Knight R. Deblur rapidly resolves single-nucleotide community sequence patterns. MSystems. 2017;2(2):e00191–16. doi: 10.1128/msystems.00191-16. - DOI - PMC - PubMed
    1. Antich A, Palacín C, OS Wangensteen, Turon X. To denoise or to cluster, that is not the question: optimizing pipelines for COI metabarcoding and metaphylogeography. BMC Bioinformatics. 2021;22(1):177. doi: 10.1186/s12859-021-04115-6. - DOI - PMC - PubMed
    1. Boyer F, Mercier C, Bonin A, Le Bras Y, Taberlet P, Coissac E. Obitools: a unix-inspired software package for DNA metabarcoding. Molecular Ecology Resources. 2016;16:176–182. doi: 10.1111/1755-0998.12428. - DOI - PubMed
    1. Brandt MI, Trouche B, Quintric L, Günther B, Wincker P, Poulain J, Arnaud-Haond S. Bioinformatic pipelines combining denoising and clustering tools allow for more comprehensive prokaryotic and eukaryotic metabarcoding. Molecular Ecology Resources. 2021;21(6):1904–1921. doi: 10.1111/1755-099813398. - DOI - PubMed
    1. Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA, Holmes SP. DADA2: high-resolution sample inference from Illumina amplicon data. Nature Methods. 2016;13(7):581–583. doi: 10.1038/nmeth.3869. - DOI - PMC - PubMed

Publication types

LinkOut - more resources