. 2022 Mar 1;12(3):e8603.

doi: 10.1002/ece3.8603. eCollection 2022 Feb.

Amplicon_sorter: A tool for reference-free amplicon sorting based on sequence similarity and for building consensus sequences

Andy R Vierstraete¹, Bart P Braeckman¹

Affiliations

PMID: 35261737
PMCID: PMC8888255
DOI: 10.1002/ece3.8603

Amplicon_sorter: A tool for reference-free amplicon sorting based on sequence similarity and for building consensus sequences

Andy R Vierstraete et al. Ecol Evol. 2022.

. 2022 Mar 1;12(3):e8603.

doi: 10.1002/ece3.8603. eCollection 2022 Feb.

Authors

Andy R Vierstraete¹, Bart P Braeckman¹

Affiliation

¹ Laboratory of aging physiology and Molecular Evolution University of Gent Gent Belgium.

PMID: 35261737
PMCID: PMC8888255
DOI: 10.1002/ece3.8603

Abstract

Oxford Nanopore Technologies (ONT) is a third-generation sequencing technology that is gaining popularity in ecological research for its portable and low-cost sequencing possibilities. Although the technology excels at long-read sequencing, it can also be applied to sequence amplicons. The downside of ONT is the low quality of the raw reads. Hence, generating a high-quality consensus sequence is still a challenge. We present Amplicon_sorter, a tool for reference-free sorting of ONT sequenced amplicons based on their similarity in sequence and length and for building solid consensus sequences.

Keywords: DNA barcoding; Oxford Nanopore Technologies; amplicon sequencing; biodiversity; consensus; metabarcoding; metagenetics; replacing Sanger.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

**FIGURE 1**
A step‐wise schematic diagram of the workflow of Amplicon_sorter

**FIGURE A1**
Comparison between the Sanger sequence and the consensus produced by Amplicon_sorter. (a) BC01 with 99.54% similarity. (b) BC04 with 98.57% similarity. Errors are consistent underestimations of homopolymer length by ONT basecalling

**FIGURE A2**
The percentage of reads used to create a consensus plotted against the read quality score. Blue: randomly sampled reads with the default settings of Amplicon_sorter. Green: comparing all reads with each other (requires more processing time)

**FIGURE A3**
Comparison of a few raw ONT spacer reads from *Cordulegaster insignis* and C. *mzymtae* with the Sanger references. There are no consistent differences between *insignis* and *mzymtae* reads because of the high error rate (red rectangles in the alignment indicate the differences in Sanger sequence between both species). This is likely the reason why the script cannot separate these reads into specific groups

**FIGURE A4**
Example of false/redundant consensus sequences produced by Amplicon_sorter. (a) four consensus sequences of the same species. The first two with similar identity to the Sanger reference and more than 300 reads. The third and fourth have a much lower similarity. (b) alignment of those reads. The first consensus has 40 extra bases at the 5’ end, the second read has 40 extra bases more at the 3’ end. The middle parts of both reads are almost identical. The third sequence differs in many positions and is built from 152 reads with similar errors. The last sequence differs even more and is a consensus built from only two reads

**FIGURE A5**
Memory utilization during a run of Amplicon_sorter when analyzing datasets with different number of species and number of reads sampled. (a) 50 species with multiple genes and 142,000 reads sampled. The memory consumption did not exceed 7 GB. (b) 511 species with one gene, 100,000 reads used. Memory consumption had a peak around 8 GB. (c) 511 species with one gene, 568,000 reads used. A peak of 70 GB when sorting species. (d) 9929 species, one gene and 500,000 reads used. Around 85 GB of memory was used to sort the species

See this image and copyright information in PMC

Cited by

Insights into mycobacteriome composition in Mycobacterium bovis-infected African buffalo (Syncerus caffer) tissue samples.
Ghielmetti G, Kerr TJ, Bernitz N, Mhlophe SK, Streicher E, Loxton AG, Warren RM, Miller MA, Goosen WJ. Ghielmetti G, et al. Sci Rep. 2024 Jul 30;14(1):17537. doi: 10.1038/s41598-024-68189-x. Sci Rep. 2024. PMID: 39080347 Free PMC article.
The newest Oxford Nanopore R10.4.1 full-length 16S rRNA sequencing enables the accurate resolution of species-level microbial community profiling.
Zhang T, Li H, Ma S, Cao J, Liao H, Huang Q, Chen W. Zhang T, et al. Appl Environ Microbiol. 2023 Oct 31;89(10):e0060523. doi: 10.1128/aem.00605-23. Epub 2023 Oct 6. Appl Environ Microbiol. 2023. PMID: 37800969 Free PMC article.
Robot-Aided Measurement of Insect Diversity on Vegetation Using Environmental DNA.
Koubínová D, Kirchgeorg S, Geckeler C, Thurnheer S, Lüthi M, Sanchez T, Mintchev S, Pellissier L, Albouy C. Koubínová D, et al. Ecol Evol. 2025 May 7;15(5):e71391. doi: 10.1002/ece3.71391. eCollection 2025 May. Ecol Evol. 2025. PMID: 40342700 Free PMC article.
Metagenomic evaluation of bacteria in drinking water using full-length 16S rRNA amplicons.
Taylor W, Devane ML, Russell K, Lin S, Roxburgh C, Williamson J, Gilpin BJ. Taylor W, et al. J Water Health. 2024 Aug;22(8):1429-1443. doi: 10.2166/wh.2024.090. Epub 2024 Jul 30. J Water Health. 2024. PMID: 39212280
First record of mermithid parasitism in adult biting midges, Culicoides huffi (Diptera: Ceratopogonidae), collected from Southern Thailand, with ultrastructural and molecular characterization.
Promrangsee C, Sanprasert V, Thepparat A, Sunantaraporn S, Ampol R, Boonserm R, Siriyasatien P, Preativatanyou K. Promrangsee C, et al. Parasit Vectors. 2025 Jul 28;18(1):303. doi: 10.1186/s13071-025-06958-x. Parasit Vectors. 2025. PMID: 40722115 Free PMC article.

See all "Cited by" articles

References

1. Bolyen, E. , Rideout, J. R. , Dillon, M. R. , Bokulich, N. A. , Abnet, C. C. , Al‐Ghalith, G. A. , Alexander, H. , Alm, E. J. , Arumugam, M. , Asnicar, F. , Bai, Y. , Bisanz, J. E. , Bittinger, K. , Brejnrod, A. , Brislawn, C. J. , Brown, C. T. , Callahan, B. J. , Caraballo‐Rodríguez, A. M. , Chase, J. , … Caporaso, J. G. (2019). Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nature Biotechnology, 37(8), 852–857. 10.1038/s41587-019-0209-9 - DOI - PMC - PubMed
1. Calus, S. T. , Ijaz, U. Z. , & Pinto, A. J. (2018). NanoAmpli‐Seq: a workflow for amplicon sequencing for mixed microbial communities on the nanopore sequencing platform. GigaScience, 7, 1–16. 10.1093/gigascience/giy140 - DOI - PMC - PubMed
1. Chan, W. S. , Au, C. H. , Lam, H. Y. , Wang, C. L. N. , Ho, D.‐N.‐Y. , Lam, Y. M. , Chu, D. K. W. , Poon, L. L. M. , Chan, T. L. , Zee, J.‐S.‐T. , Ma, E. S. K. , & Tang, B. S. F. (2020). Evaluation on the use of Nanopore sequencing for direct characterization of coronaviruses from respiratory specimens, and a study on emerging missense mutations in partial RdRP gene of SARS‐CoV‐2. Virology Journal, 17, 183. 10.1186/s12985-020-01454-3 - DOI - PMC - PubMed
1. Chang, J. J. M. , Ip, Y. C. A. , Ng, C. S. L. , & Huang, D. (2020). Takeaways from mobile DNA barcoding with BentoLab and MinION. Genes, 11, 1121. 10.3390/genes11101121 - DOI - PMC - PubMed
1. Chen, S. , Zhou, Y. , Chen, Y. , & Gu, J. (2018). fastp: An ultra‐fast all‐in‐one FASTQ preprocessor. Bioinformatics, 34, i884–i890. 10.1093/bioinformatics/bty560 - DOI - PMC - PubMed

Associated data

Dryad/10.5061/dryad.zgmsbccd0

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Amplicon_sorter: A tool for reference-free amplicon sorting based on sequence similarity and for building consensus sequences

Affiliation

Amplicon_sorter: A tool for reference-free amplicon sorting based on sequence similarity and for building consensus sequences

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Associated data

LinkOut - more resources

Full Text Sources