Comparative Study

. 2016 Jul 25;10 Suppl 2(Suppl 2):20.

doi: 10.1186/s40246-016-0068-0.

A comparative study of k-spectrum-based error correction methods for next-generation sequencing data analysis

Isaac Akogwu¹, Nan Wang¹, Chaoyang Zhang¹, Ping Gong²

Affiliations

¹ School of Computing, University of Southern Mississippi, Hattiesburg, MS, 39406, USA.
² Environmental Laboratory, U.S. Army Engineer Research and Development Center, Vicksburg, MS, 39180, USA. Ping.Gong@usace.army.mil.

PMID: 27461106
PMCID: PMC4965716
DOI: 10.1186/s40246-016-0068-0

Comparative Study

A comparative study of k-spectrum-based error correction methods for next-generation sequencing data analysis

Isaac Akogwu et al. Hum Genomics. 2016.

. 2016 Jul 25;10 Suppl 2(Suppl 2):20.

doi: 10.1186/s40246-016-0068-0.

Authors

Isaac Akogwu¹, Nan Wang¹, Chaoyang Zhang¹, Ping Gong²

Affiliations

¹ School of Computing, University of Southern Mississippi, Hattiesburg, MS, 39406, USA.
² Environmental Laboratory, U.S. Army Engineer Research and Development Center, Vicksburg, MS, 39180, USA. Ping.Gong@usace.army.mil.

PMID: 27461106
PMCID: PMC4965716
DOI: 10.1186/s40246-016-0068-0

Abstract

Background: Innumerable opportunities for new genomic research have been stimulated by advancement in high-throughput next-generation sequencing (NGS). However, the pitfall of NGS data abundance is the complication of distinction between true biological variants and sequence error alterations during downstream analysis. Many error correction methods have been developed to correct erroneous NGS reads before further analysis, but independent evaluation of the impact of such dataset features as read length, genome size, and coverage depth on their performance is lacking. This comparative study aims to investigate the strength and weakness as well as limitations of some newest k-spectrum-based methods and to provide recommendations for users in selecting suitable methods with respect to specific NGS datasets.

Methods: Six k-spectrum-based methods, i.e., Reptile, Musket, Bless, Bloocoo, Lighter, and Trowel, were compared using six simulated sets of paired-end Illumina sequencing data. These NGS datasets varied in coverage depth (10× to 120×), read length (36 to 100 bp), and genome size (4.6 to 143 MB). Error Correction Evaluation Toolkit (ECET) was employed to derive a suite of metrics (i.e., true positives, false positive, false negative, recall, precision, gain, and F-score) for assessing the correction quality of each method.

Results: Results from computational experiments indicate that Musket had the best overall performance across the spectra of examined variants reflected in the six datasets. The lowest accuracy of Musket (F-score = 0.81) occurred to a dataset with a medium read length (56 bp), a medium coverage (50×), and a small-sized genome (5.4 MB). The other five methods underperformed (F-score < 0.80) and/or failed to process one or more datasets.

Conclusions: This study demonstrates that various factors such as coverage depth, read length, and genome size may influence performance of individual k-spectrum-based error correction methods. Thus, efforts have to be paid in choosing appropriate methods for error correction of specific NGS datasets. Based on our comparative study, we recommend Musket as the top choice because of its consistently superior performance across all six testing datasets. Further extensive studies are warranted to assess these methods using experimental datasets generated by NGS platforms (e.g., 454, SOLiD, and Ion Torrent) under more diversified parameter settings (k-mer values and edit distances) and to compare them against other non-k-spectrum-based classes of error correction methods.

Keywords: Bloom filter; Error correction; Next-generation sequencing (NGS); Sequence analysis; k-mer; k-spectrum.

PubMed Disclaimer

Figures

**Fig. 1**
General framework of k-spectrum-based error correctors

**Fig. 2**
Workflow of error correction performance analysis using ECET (Error Correction Evaluation Toolkit [15]). See http://aluru-sun.ece.iastate.edu/doku.php?id=ecr for more information

**Fig. 3**
Impact of read length (a), coverage depth (b), and genome size (c) on the performance of six k-spectrum-based error correction methods. The six datasets are reordered according to the factor examined in order to show visually the effect of each factor on F-score for each method (see Table 3 for dataset, method, and F-score information)

See this image and copyright information in PMC

Cited by

K-Mer Spectrum-Based Error Correction Algorithm for Next-Generation Sequencing Data.
AlEisa HN, Hamad S, Elhadad A. AlEisa HN, et al. Comput Intell Neurosci. 2022 Jul 14;2022:8077664. doi: 10.1155/2022/8077664. eCollection 2022. Comput Intell Neurosci. 2022. PMID: 35875730 Free PMC article.
Lerna: transformer architectures for configuring error correction tools for short- and long-read genome sequencing.
Sharma A, Jain P, Mahgoub A, Zhou Z, Mahadik K, Chaterji S. Sharma A, et al. BMC Bioinformatics. 2022 Jan 6;23(1):25. doi: 10.1186/s12859-021-04547-0. BMC Bioinformatics. 2022. PMID: 34991450 Free PMC article.
Molecular characterization of an unauthorized genetically modified Bacillus subtilis production strain identified in a vitamin B₂ feed additive.
Paracchini V, Petrillo M, Reiting R, Angers-Loustau A, Wahler D, Stolz A, Schönig B, Matthies A, Bendiek J, Meinel DM, Pecoraro S, Busch U, Patak A, Kreysa J, Grohmann L. Paracchini V, et al. Food Chem. 2017 Sep 1;230:681-689. doi: 10.1016/j.foodchem.2017.03.042. Epub 2017 Mar 9. Food Chem. 2017. PMID: 28407967 Free PMC article.
HPTAS: An Alignment-Free Haplotype Phasing Algorithm Focused on Allele-Specific Studies Using Transcriptome Data.
Wang J, Sun Z, Wang G, Miao Y. Wang J, et al. Int J Mol Sci. 2025 Jun 13;26(12):5700. doi: 10.3390/ijms26125700. Int J Mol Sci. 2025. PMID: 40565162 Free PMC article.
An overlooked phenomenon: complex interactions of potential error sources on the quality of bacterial de novo genome assemblies.
Rádai Z, Váradi A, Takács P, Nagy NA, Schmitt N, Prépost E, Kardos G, Laczkó L. Rádai Z, et al. BMC Genomics. 2024 Jan 9;25(1):45. doi: 10.1186/s12864-023-09910-4. BMC Genomics. 2024. PMID: 38195441 Free PMC article.

See all "Cited by" articles

References

1. Metzker ML. Sequencing technologies—the next generation. Nat Rev Genet. 2010;11:31–46. doi: 10.1038/nrg2626. - DOI - PubMed
1. Lupski JR, Reid JG, Gonzaga-Jauregui C, Rio DD, Chen DC, Nazareth L, et al. Whole-genome sequencing in a patient with Charcot-Marie-Tooth neuropathy. N Engl J Med. 2010;362:1181–91. doi: 10.1056/NEJMoa0908094. - DOI - PMC - PubMed
1. Peterson J, Garges S, Giovanni M, McInnes P, Wang L, Schloss JA, et al. The NIH Human Microbiome Project. Genome Res. 2009;19:2317–23. doi: 10.1101/gr.096651.109. - DOI - PMC - PubMed
1. Salzberg SL, Phillippy AM, Zimin A, Puiu D, Magoc T, Koren S, et al. GAGE: a critical evaluation of genome assemblies and assembly algorithms. Genome Res. 2012;22:557–67. doi: 10.1101/gr.131383.111. - DOI - PMC - PubMed
1. Nielsen R, Paul JS, Albrechtsen A, Song YS. Genotype and SNP calling from next-generation sequencing data. Nat Rev Genet. 2011;12:443–51. doi: 10.1038/nrg2986. - DOI - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A comparative study of k-spectrum-based error correction methods for next-generation sequencing data analysis

Affiliations

A comparative study of k-spectrum-based error correction methods for next-generation sequencing data analysis

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous