Positional bias in variant calls against draft reference assemblies
- PMID: 28351369
- PMCID: PMC5368935
- DOI: 10.1186/s12864-017-3637-2
Positional bias in variant calls against draft reference assemblies
Abstract
Background: Whole genome resequencing projects may implement variant calling using draft reference genomes assembled de novo from short-read libraries. Despite lower quality of such assemblies, they allowed researchers to extend a wide range of population genetic and genome-wide association analyses to non-model species. As the variant calling pipelines are complex and involve many software packages, it is important to understand inherent biases and limitations at each step of the analysis.
Results: In this article, we report a positional bias present in variant calling performed against draft reference assemblies constructed from de Bruijn or string overlap graphs. We assessed how frequently variants appeared at each position counted from ends of a contig or scaffold sequence, and discovered unexpectedly high number of variants at the positions related to the length of either k-mers or reads used for the assembly. We detected the bias in both publicly available draft assemblies from Assemblathon 2 competition as well as in the assemblies we generated from our simulated short-read data. Simulations confirmed that the bias causing variants are predominantly false positives induced by reads from spatially distant repeated sequences. The bias is particularly strong in contig assemblies. Scaffolding does not eliminate the bias but tends to mitigate it because of the changes in variants' relative positions and alterations in read alignments. The bias can be effectively reduced by filtering out the variants that reside in repetitive elements.
Conclusions: Draft genome sequences generated by several popular assemblers appear to be susceptible to the positional bias potentially affecting many resequencing projects in non-model species. The bias is inherent to the assembly algorithms and arises from their particular handling of repeated sequences. It is recommended to reduce the bias by filtering especially if higher-quality genome assembly cannot be achieved. Our findings can help other researchers to improve the quality of their variant data sets and reduce artefactual findings in downstream analyses.
Keywords: Draft reference genome; Polymorphisms; Positional bias; Repetitive elements; Reseqencing; SNPs; Variants.
Figures




Similar articles
-
An improved genome reference for the African cichlid, Metriaclima zebra.BMC Genomics. 2015 Sep 22;16(1):724. doi: 10.1186/s12864-015-1930-5. BMC Genomics. 2015. PMID: 26394688 Free PMC article.
-
Challenges and advances for transcriptome assembly in non-model species.PLoS One. 2017 Sep 20;12(9):e0185020. doi: 10.1371/journal.pone.0185020. eCollection 2017. PLoS One. 2017. PMID: 28931057 Free PMC article.
-
The complex task of choosing a de novo assembly: lessons from fungal genomes.Comput Biol Chem. 2014 Dec;53 Pt A:97-107. doi: 10.1016/j.compbiolchem.2014.08.014. Epub 2014 Aug 29. Comput Biol Chem. 2014. PMID: 25262360
-
The present and future of de novo whole-genome assembly.Brief Bioinform. 2018 Jan 1;19(1):23-40. doi: 10.1093/bib/bbw096. Brief Bioinform. 2018. PMID: 27742661 Review.
-
De novo assembly of short sequence reads.Brief Bioinform. 2010 Sep;11(5):457-72. doi: 10.1093/bib/bbq020. Epub 2010 Aug 19. Brief Bioinform. 2010. PMID: 20724458 Review.
Cited by
-
Pan-Genomic and Polymorphic Driven Prediction of Antibiotic Resistance in Elizabethkingia.Front Microbiol. 2019 Jul 4;10:1446. doi: 10.3389/fmicb.2019.01446. eCollection 2019. Front Microbiol. 2019. PMID: 31333599 Free PMC article.
-
Evolution of Colistin Resistance in the Klebsiella pneumoniae Complex Follows Multiple Evolutionary Trajectories with Variable Effects on Fitness and Virulence Characteristics.Antimicrob Agents Chemother. 2020 Dec 16;65(1):e01958-20. doi: 10.1128/AAC.01958-20. Print 2020 Dec 16. Antimicrob Agents Chemother. 2020. PMID: 33139278 Free PMC article.
-
Polygenic plague resistance in the great gerbil uncovered by population sequencing.PNAS Nexus. 2022 Oct 5;1(5):pgac211. doi: 10.1093/pnasnexus/pgac211. eCollection 2022 Nov. PNAS Nexus. 2022. PMID: 36712379 Free PMC article.
-
Genomic exploration of sequential clinical isolates reveals a distinctive molecular signature of persistent Staphylococcus aureus bacteraemia.Genome Med. 2018 Aug 23;10(1):65. doi: 10.1186/s13073-018-0574-x. Genome Med. 2018. PMID: 30103826 Free PMC article.
-
Reference-guided de novo assembly approach improves genome reconstruction for related species.BMC Bioinformatics. 2017 Nov 10;18(1):474. doi: 10.1186/s12859-017-1911-6. BMC Bioinformatics. 2017. PMID: 29126390 Free PMC article.
References
-
- Chia JM, Song C, Bradbury PJ, Costich D, de Leon N, Doebley J, et al. Maize HapMap2 identifies extant variation from a genome in flux. Nature Genetics. 2012; 44(7):803–807. Available from: http://dx.doi.org/10.1038/ng.2313. - DOI - PubMed
-
- Stanton-Geddes J, Paape T, Epstein B, Briskine R, Yoder J, Mudge J, et al. Candidate genes and genetic architecture of symbiotic and agronomic traits revealed by whole-genome, sequence-based association genetics in Medicago truncatula. PLoS ONE. 2013; 8(5):Available from: http://dx.doi.org/10.1371/journal.pone.0065688. - DOI - PMC - PubMed
-
- Gudbjartsson DF, Helgason H, Gudjonsson SA, Zink F, Oddson A, Gylfason A, et al. Large-scale whole-genome sequencing of the Icelandic population. Nature Genetics. 2015; 47(5):435–444. Available from: http://dx.doi.org/10.1038/ng.3247. - DOI - PubMed
-
- Rondeau EB, Minkley DR, Leong JS, Messmer AM, Jantzen JR, von Schalburg KR, et al. The genome and linkage map of the northern pike (Esox lucius): conserved synteny revealed between the salmonid sister group and the Neoteleostei. PLoS ONE. 2014; Jul;9(7):e102089. Available from: http://dx.doi.org/10.1371/journal.pone.0102089. - DOI - PMC - PubMed
-
- Nowak M, Russo G, Schlapbach R, Huu C, Lenhard M, Conti E. The draft genome of Primula veris yields insights into the molecular basis of heterostyly. Genome Biology. 2015; 16(1):12. Available from: http://genomebiology.com/2015/16/1/12. - PMC - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials