. 2018 Feb 9;19(Suppl 3):92.

doi: 10.1186/s12864-018-4475-6.

Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome

Fedor M Naumenko¹, Irina I Abnizova^{2

3}, Nathan Beka⁴, Mikhail A Genaev⁵, Yuriy L Orlov^{6

7

8}

Affiliations

¹ Novosibirsk State University, Pirogova, 1, Novosibirsk, 630090, Russia. fedor.naumenko@gmail.com.
² Wellcome Trust Sanger Institute, Cambridge, UK.
³ Babraham Institute, Cambridge, UK.
⁴ University of Hertfordshire, Hertfordshire, UK.
⁵ Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia.
⁶ Novosibirsk State University, Pirogova, 1, Novosibirsk, 630090, Russia. orlov@bionet.nsc.ru.
⁷ Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia. orlov@bionet.nsc.ru.
⁸ Institute of Marine Biology Researches of RAS, Sevastopol, Russia. orlov@bionet.nsc.ru.

PMID: 29504893
PMCID: PMC5836841
DOI: 10.1186/s12864-018-4475-6

Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome

Fedor M Naumenko et al. BMC Genomics. 2018.

. 2018 Feb 9;19(Suppl 3):92.

doi: 10.1186/s12864-018-4475-6.

Authors

Fedor M Naumenko¹, Irina I Abnizova^{2

3}, Nathan Beka⁴, Mikhail A Genaev⁵, Yuriy L Orlov^{6

7

8}

Affiliations

¹ Novosibirsk State University, Pirogova, 1, Novosibirsk, 630090, Russia. fedor.naumenko@gmail.com.
² Wellcome Trust Sanger Institute, Cambridge, UK.
³ Babraham Institute, Cambridge, UK.
⁴ University of Hertfordshire, Hertfordshire, UK.
⁵ Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia.
⁶ Novosibirsk State University, Pirogova, 1, Novosibirsk, 630090, Russia. orlov@bionet.nsc.ru.
⁷ Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia. orlov@bionet.nsc.ru.
⁸ Institute of Marine Biology Researches of RAS, Sevastopol, Russia. orlov@bionet.nsc.ru.

PMID: 29504893
PMCID: PMC5836841
DOI: 10.1186/s12864-018-4475-6

Abstract

Background: The use of artificial data to evaluate the performance of aligners and peak callers not only improves its accuracy and reliability, but also makes it possible to reduce the computational time. One of the natural ways to achieve such time reduction is by mapping a single chromosome.

Results: We investigated whether a single chromosome mapping causes any artefacts in the alignments' performances. In this paper, we compared the accuracy of the performance of seven aligners on well-controlled simulated benchmark data which was sampled from a single chromosome and also from a whole genome. We found that commonly used statistical methods are insufficient to evaluate an aligner performance, and applied a novel measure of a read density distribution similarity, which allowed to reveal artefacts in aligners' performances. We also calculated some interesting mismatch statistics, and constructed mismatch frequency distributions along the read.

Conclusions: The generation of artificial data by mapping of reads generated from a single chromosome to a reference chromosome is justified from the point of view of reducing the benchmarking time. The proposed quality assessment method allows to identify the inherent shortcoming of aligners that are not detected by conventional statistical methods, and can affect the quality of alignment of real data.

Keywords: DNA alignment; Next-generation sequencing; Read density distribution.

PubMed Disclaimer

Conflict of interest statement

Authors’ information

FN is an independent software developer, employed by Novosibirsk State University. IA is a Daphne Jackson Fellow sponsored by the Babraham Institute and BBSRC. NB is researcher at University of Hertfordshire. MG is a Postdoc at ICG SB RAS. YO is a senior scientist at ICG SB RAS, IMBR RAS, and professor at Novosibirsk State University.

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

**Fig. 1**
Coefficients of variation of the sample, *case1*. Panel labels: general (total) CVS are marked in green, head CVS - in mauve, tail CVS - in brown. SE are shown with light colour bars, PE – with dark colour bars

**Fig. 2**
Density profiles for mapping of reads in log-log coordinates, *case1*

**Fig. 3**
Fragment of coverage of SE alignments, *case1*. All the tracks have the same data range (vertical scale)

**Fig. 4**
Coefficients of variation of the sample, *case2*. Panel labels: general (total) CVS are marked in green, head CVS - in mauve, tail CVS - in brown. SE are shown with light colour bars, PE – with dark colour bars

**Fig. 5**
Fragment of coverage of PE alignments, *case2*

**Fig. 6**
Coefficients of variation of the sample, *case3*. Panel labels: general (total) CVS are marked in green, head CVS - in mauve, tail CVS - in brown. SE are shown with light colour bars, PE – with dark colour bars

**Fig. 7**
F1 scores. All reads – light colour bars, non-sero reads (‘reliable’) – dark colour bars

**Fig. 8**
Percentage of non-zero-scored and zero-scored reads with detected mismatches. SE – pale histogram bars, PE – bright histogram bars

**Fig. 9**
The frequency of mismatches depending on the position in the read, in log coordinate, *case2*

**Fig. 10**
Fragment of alignments in comparison with low mappability tracks, UCSC genome browser

See this image and copyright information in PMC

References

1. Soon WW, Hariharan M, Snyder MP. High-throughput sequencing for biology and medicine. Mol Syst Biol. 2013;9:640. doi: 10.1038/msb.2012.61. - DOI - PMC - PubMed
1. Ruffalo M, LaFramboise T, Koyutürk M. Comparative analysis of algorithms for next-generation sequencing read alignment. Bioinformatics. 2011;27(20):2790–2796. doi: 10.1093/bioinformatics/btr477. - DOI - PubMed
1. Fonseca NA, Rung J, Brazma A, Marioni JC. Tools for mapping high-throughput sequencing data. Bioinformatics. 2012;28(24):3169–3177. doi: 10.1093/bioinformatics/bts605. - DOI - PubMed
1. Schbath S, Martin V, Zytnicki M, Fayolle J, Loux V, Gibrat JF. Mapping reads on a genomic sequence: an algorithmic overview and a practical comparative analysis. J Comput Biol. 2012;19(6):796–813. doi: 10.1089/cmb.2012.0022. - DOI - PMC - PubMed
1. Hatem A, Bozdağ D, Toland AE, Çatalyürek ÜV. Benchmarking short sequence mapping tools. BMC Bioinformatics. 2013;14:184. doi: 10.1186/1471-2105-14-184. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome

Affiliations

Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome

Authors

Affiliations

Abstract

Conflict of interest statement

Authors’ information

Ethics approval and consent to participate

Consent for publication

Competing interests

Publisher’s Note

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources