Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome
- PMID: 29504893
- PMCID: PMC5836841
- DOI: 10.1186/s12864-018-4475-6
Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome
Abstract
Background: The use of artificial data to evaluate the performance of aligners and peak callers not only improves its accuracy and reliability, but also makes it possible to reduce the computational time. One of the natural ways to achieve such time reduction is by mapping a single chromosome.
Results: We investigated whether a single chromosome mapping causes any artefacts in the alignments' performances. In this paper, we compared the accuracy of the performance of seven aligners on well-controlled simulated benchmark data which was sampled from a single chromosome and also from a whole genome. We found that commonly used statistical methods are insufficient to evaluate an aligner performance, and applied a novel measure of a read density distribution similarity, which allowed to reveal artefacts in aligners' performances. We also calculated some interesting mismatch statistics, and constructed mismatch frequency distributions along the read.
Conclusions: The generation of artificial data by mapping of reads generated from a single chromosome to a reference chromosome is justified from the point of view of reducing the benchmarking time. The proposed quality assessment method allows to identify the inherent shortcoming of aligners that are not detected by conventional statistical methods, and can affect the quality of alignment of real data.
Keywords: DNA alignment; Next-generation sequencing; Read density distribution.
Conflict of interest statement
Authors’ information
FN is an independent software developer, employed by Novosibirsk State University. IA is a Daphne Jackson Fellow sponsored by the Babraham Institute and BBSRC. NB is researcher at University of Hertfordshire. MG is a Postdoc at ICG SB RAS. YO is a senior scientist at ICG SB RAS, IMBR RAS, and professor at Novosibirsk State University.
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Figures










Similar articles
-
Evaluation and assessment of read-mapping by multiple next-generation sequencing aligners based on genome-wide characteristics.Genomics. 2017 Jul;109(3-4):186-191. doi: 10.1016/j.ygeno.2017.03.001. Epub 2017 Mar 9. Genomics. 2017. PMID: 28286147
-
Accelerating the Next Generation Long Read Mapping with the FPGA-Based System.IEEE/ACM Trans Comput Biol Bioinform. 2014 Sep-Oct;11(5):840-52. doi: 10.1109/TCBB.2014.2326876. IEEE/ACM Trans Comput Biol Bioinform. 2014. PMID: 26356857
-
A tandem simulation framework for predicting mapping quality.Genome Biol. 2017 Aug 10;18(1):152. doi: 10.1186/s13059-017-1290-3. Genome Biol. 2017. PMID: 28806977 Free PMC article.
-
IMOS: improved Meta-aligner and Minimap2 On Spark.BMC Bioinformatics. 2019 Jan 24;20(1):51. doi: 10.1186/s12859-018-2592-5. BMC Bioinformatics. 2019. PMID: 30678641 Free PMC article.
-
On the accuracy of short read mapping.Methods Mol Biol. 2013;1038:39-59. doi: 10.1007/978-1-62703-514-9_3. Methods Mol Biol. 2013. PMID: 23872968
Cited by
-
Bioinformatics tools for the sequence complexity estimates.Biophys Rev. 2023 Sep 15;15(5):1367-1378. doi: 10.1007/s12551-023-01140-y. eCollection 2023 Oct. Biophys Rev. 2023. PMID: 37974990 Free PMC article. Review.
-
Statistical estimates of multiple transcription factors binding in the model plant genomes based on ChIP-seq data.J Integr Bioinform. 2021 Dec 21;19(1):20200036. doi: 10.1515/jib-2020-0036. J Integr Bioinform. 2021. PMID: 34953471 Free PMC article.
-
Genomics at Belyaev conference - 2017.BMC Genomics. 2018 Feb 9;19(Suppl 3):79. doi: 10.1186/s12864-018-4476-5. BMC Genomics. 2018. PMID: 29504918 Free PMC article. No abstract available.
-
Genomics research at Bioinformatics of Genome Regulation and Structure\ Systems Biology (BGRS\SB) conferences in Novosibirsk.BMC Genomics. 2019 May 8;20(Suppl 3):322. doi: 10.1186/s12864-019-5707-0. BMC Genomics. 2019. PMID: 32039700 Free PMC article. No abstract available.
References
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources