Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Mar 1;12(3):e8603.
doi: 10.1002/ece3.8603. eCollection 2022 Feb.

Amplicon_sorter: A tool for reference-free amplicon sorting based on sequence similarity and for building consensus sequences

Affiliations

Amplicon_sorter: A tool for reference-free amplicon sorting based on sequence similarity and for building consensus sequences

Andy R Vierstraete et al. Ecol Evol. .

Abstract

Oxford Nanopore Technologies (ONT) is a third-generation sequencing technology that is gaining popularity in ecological research for its portable and low-cost sequencing possibilities. Although the technology excels at long-read sequencing, it can also be applied to sequence amplicons. The downside of ONT is the low quality of the raw reads. Hence, generating a high-quality consensus sequence is still a challenge. We present Amplicon_sorter, a tool for reference-free sorting of ONT sequenced amplicons based on their similarity in sequence and length and for building solid consensus sequences.

Keywords: DNA barcoding; Oxford Nanopore Technologies; amplicon sequencing; biodiversity; consensus; metabarcoding; metagenetics; replacing Sanger.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

FIGURE 1
FIGURE 1
A step‐wise schematic diagram of the workflow of Amplicon_sorter
FIGURE A1
FIGURE A1
Comparison between the Sanger sequence and the consensus produced by Amplicon_sorter. (a) BC01 with 99.54% similarity. (b) BC04 with 98.57% similarity. Errors are consistent underestimations of homopolymer length by ONT basecalling
FIGURE A2
FIGURE A2
The percentage of reads used to create a consensus plotted against the read quality score. Blue: randomly sampled reads with the default settings of Amplicon_sorter. Green: comparing all reads with each other (requires more processing time)
FIGURE A3
FIGURE A3
Comparison of a few raw ONT spacer reads from Cordulegaster insignis and C. mzymtae with the Sanger references. There are no consistent differences between insignis and mzymtae reads because of the high error rate (red rectangles in the alignment indicate the differences in Sanger sequence between both species). This is likely the reason why the script cannot separate these reads into specific groups
FIGURE A4
FIGURE A4
Example of false/redundant consensus sequences produced by Amplicon_sorter. (a) four consensus sequences of the same species. The first two with similar identity to the Sanger reference and more than 300 reads. The third and fourth have a much lower similarity. (b) alignment of those reads. The first consensus has 40 extra bases at the 5’ end, the second read has 40 extra bases more at the 3’ end. The middle parts of both reads are almost identical. The third sequence differs in many positions and is built from 152 reads with similar errors. The last sequence differs even more and is a consensus built from only two reads
FIGURE A5
FIGURE A5
Memory utilization during a run of Amplicon_sorter when analyzing datasets with different number of species and number of reads sampled. (a) 50 species with multiple genes and 142,000 reads sampled. The memory consumption did not exceed 7 GB. (b) 511 species with one gene, 100,000 reads used. Memory consumption had a peak around 8 GB. (c) 511 species with one gene, 568,000 reads used. A peak of 70 GB when sorting species. (d) 9929 species, one gene and 500,000 reads used. Around 85 GB of memory was used to sort the species

Similar articles

Cited by

References

    1. Bolyen, E. , Rideout, J. R. , Dillon, M. R. , Bokulich, N. A. , Abnet, C. C. , Al‐Ghalith, G. A. , Alexander, H. , Alm, E. J. , Arumugam, M. , Asnicar, F. , Bai, Y. , Bisanz, J. E. , Bittinger, K. , Brejnrod, A. , Brislawn, C. J. , Brown, C. T. , Callahan, B. J. , Caraballo‐Rodríguez, A. M. , Chase, J. , … Caporaso, J. G. (2019). Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nature Biotechnology, 37(8), 852–857. 10.1038/s41587-019-0209-9 - DOI - PMC - PubMed
    1. Calus, S. T. , Ijaz, U. Z. , & Pinto, A. J. (2018). NanoAmpli‐Seq: a workflow for amplicon sequencing for mixed microbial communities on the nanopore sequencing platform. GigaScience, 7, 1–16. 10.1093/gigascience/giy140 - DOI - PMC - PubMed
    1. Chan, W. S. , Au, C. H. , Lam, H. Y. , Wang, C. L. N. , Ho, D.‐N.‐Y. , Lam, Y. M. , Chu, D. K. W. , Poon, L. L. M. , Chan, T. L. , Zee, J.‐S.‐T. , Ma, E. S. K. , & Tang, B. S. F. (2020). Evaluation on the use of Nanopore sequencing for direct characterization of coronaviruses from respiratory specimens, and a study on emerging missense mutations in partial RdRP gene of SARS‐CoV‐2. Virology Journal, 17, 183. 10.1186/s12985-020-01454-3 - DOI - PMC - PubMed
    1. Chang, J. J. M. , Ip, Y. C. A. , Ng, C. S. L. , & Huang, D. (2020). Takeaways from mobile DNA barcoding with BentoLab and MinION. Genes, 11, 1121. 10.3390/genes11101121 - DOI - PMC - PubMed
    1. Chen, S. , Zhou, Y. , Chen, Y. , & Gu, J. (2018). fastp: An ultra‐fast all‐in‐one FASTQ preprocessor. Bioinformatics, 34, i884–i890. 10.1093/bioinformatics/bty560 - DOI - PMC - PubMed

LinkOut - more resources