Validated removal of nuclear pseudogenes and sequencing artefacts from mitochondrial metabarcode data
- PMID: 33503286
- DOI: 10.1111/1755-0998.13337
Validated removal of nuclear pseudogenes and sequencing artefacts from mitochondrial metabarcode data
Abstract
Metabarcoding of Metazoa using mitochondrial genes may be confounded by both the accumulation of PCR and sequencing artefacts and the co-amplification of nuclear mitochondrial pseudogenes (NUMTs). The application of read abundance thresholds and denoising methods is efficient in reducing noise accompanying authentic mitochondrial amplicon sequence variants (ASVs). However, these procedures do not fully account for the complex nature of concomitant sequences and the highly variable DNA contribution of specimens in a metabarcoding sample. We propose, as a complement to denoising, the metabarcoding Multidimensional Abundance Threshold Evaluation (metaMATE) framework, a novel approach that allows comprehensive examination of multiple dimensions of abundance filtering and the evaluation of the prevalence of unwanted concomitant sequences in denoised metabarcoding datasets. metaMATE requires a denoised set of ASVs as input, and designates a subset of ASVs as being either authentic (mitochondrial DNA haplotypes) or nonauthentic ASVs (NUMTs and erroneous sequences) by comparison to external reference data and by analysing nucleotide substitution patterns. metaMATE (i) facilitates the application of read abundance filtering strategies, which are structured with regard to sequence library and phylogeny and applied for a range of increasing abundance threshold values, and (ii) evaluates their performance by quantifying the prevalence of nonauthentic ASVs and the collateral effects on the removal of authentic ASVs. The output from metaMATE facilitates decision-making about required filtering stringency and can be used to improve the reliability of intraspecific genetic information derived from metabarcode data. The framework is implemented in the metaMATE software (available at https://github.com/tjcreedy/metamate).
Keywords: HTS; Metazoa; NGS; NUMT; denoising; intraspecific variation; pseudogene; spurious sequences; taxonomic inflation.
© 2021 John Wiley & Sons Ltd.
References
REFERENCES
-
- Altschul, S. F., Gish, W., Miller, W., Myers, E. W., & Lipman, D. J. (1990). Basic local alignment search tool. Journal of Molecular Biology, 215(3), 403-410. https://doi.org/10.1016/S0022-2836(05)80360-2.
-
- Amir, A., McDonald, D., Navas-Molina, J. A., Kopylova, E., Morton, J. T., Zech Xu, Z., Kightley, E. P., Thompson, L. R., Hyde, E. R., Gonzalez, A., & Knight, R. (2017). Deblur rapidly resolves single-nucleotide community sequence patterns. American Society for Microbiology, 2(2), 1-7. https://doi.org/10.1128/mSystems.00191-16.
-
- Andújar, C., Arribas, P., Gray, C., Bruce, C., Woodward, G., Yu, D. W., & Vogler, A. P. (2018). Metabarcoding of freshwater invertebrates to detect the effects of a pesticide spill. Molecular Ecology, 27(1), 146-166. https://doi.org/10.1111/mec.14410.
-
- Andújar, C., Arribas, P., López, H., Arjona, Y., Pérez-Delgado, A., Oromí, P., Vogler, A. P., & Emerson, B. C. Metaphylogeography of soil mesofauna assemblages reveals strong habitat specialisation and geographical diversification within the soils of an oceanic island. In prep.
-
- Andújar, C., Creedy, T. J., Arribas, P., López, H., Salces-Castellano, A., Pérez-Delgado, A., Vogler, A. P., & Emerson, B. C. 2020; Metabarcode data used to test the metaMATE approach; Dryad; https://doi.org/10.5061/dryad.tmpg4f4xr.
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources