A Cautionary Note on the Use of Genotype Callers in Phylogenomics
- PMID: 33084875
- PMCID: PMC8208803
- DOI: 10.1093/sysbio/syaa081
A Cautionary Note on the Use of Genotype Callers in Phylogenomics
Abstract
Next-generation-sequencing genotype callers are commonly used in studies to call variants from newly sequenced species. However, due to the current availability of genomic resources, it is still common practice to use only one reference genome for a given genus, or even one reference for an entire clade of a higher taxon. The problem with traditional genotype callers, such as the one from GATK, is that they are optimized for variant calling at the population level. However, when these callers are used at the phylogenetic level, the consequences for downstream analyses can be substantial. Here, we performed simulations to compare the performance between the genotype callers of GATK and ATLAS, and present their differences at various phylogenetic scales. We show that the genotype caller of GATK substantially underestimates the number of variants at the phylogenetic level, but not at the population level. We also found that the accuracy of heterozygote calls declines with increasing distance to the reference genome. We quantified this decline and found that it is very sharp in GATK, while ATLAS maintains high accuracy even at moderately divergent species from the reference. We further suggest that efforts should be taken towards acquiring more reference genomes per species, before pursuing high-scale phylogenomic studies. [ATLAS; efficiency of SNP calling; GATK; heterozygote calling; next-generation sequencing; reference genome; variant calling.].
© The Author(s) 2021. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.
Figures





Similar articles
-
Generation of SNP datasets for orangutan population genomics using improved reduced-representation sequencing and direct comparisons of SNP calling algorithms.BMC Genomics. 2014 Jan 10;15:16. doi: 10.1186/1471-2164-15-16. BMC Genomics. 2014. PMID: 24405840 Free PMC article.
-
Using genotype array data to compare multi- and single-sample variant calls and improve variant call sets from deep coverage whole-genome sequencing data.Bioinformatics. 2017 Apr 15;33(8):1147-1153. doi: 10.1093/bioinformatics/btw786. Bioinformatics. 2017. PMID: 28035032 Free PMC article.
-
Variant callers for next-generation sequencing data: a comparison study.PLoS One. 2013 Sep 27;8(9):e75619. doi: 10.1371/journal.pone.0075619. eCollection 2013. PLoS One. 2013. PMID: 24086590 Free PMC article.
-
Toward better understanding of artifacts in variant calling from high-coverage samples.Bioinformatics. 2014 Oct 15;30(20):2843-51. doi: 10.1093/bioinformatics/btu356. Epub 2014 Jun 27. Bioinformatics. 2014. PMID: 24974202 Free PMC article. Review.
-
Single Nucleotide Polymorphism Identification in Polyploids: A Review, Example, and Recommendations.Mol Plant. 2015 Jun;8(6):831-46. doi: 10.1016/j.molp.2015.02.002. Epub 2015 Feb 10. Mol Plant. 2015. PMID: 25676455 Review.
Cited by
-
The impact of sequencing depth and relatedness of the reference genome in population genomic studies: A case study with two caddisfly species (Trichoptera, Rhyacophilidae, Himalopsyche).Ecol Evol. 2022 Dec 12;12(12):e9583. doi: 10.1002/ece3.9583. eCollection 2022 Dec. Ecol Evol. 2022. PMID: 36523526 Free PMC article.
-
Global molecular epidemiology of the incomplete CirA protein related to cefiderocol resistance in Klebsiella pneumoniae: a genome-based study.Microbiol Spectr. 2025 May 6;13(5):e0141024. doi: 10.1128/spectrum.01410-24. Epub 2025 Mar 19. Microbiol Spectr. 2025. PMID: 40105357 Free PMC article.
-
Taking advantage of reference-guided assembly in a slowly-evolving lineage: Application to Testudo graeca.PLoS One. 2024 Aug 9;19(8):e0303408. doi: 10.1371/journal.pone.0303408. eCollection 2024. PLoS One. 2024. PMID: 39121089 Free PMC article.
-
Simulating Genetic Mixing in Strongly Structured Populations of the Threatened Southern Brown Bandicoot (Isoodon obesulus).Evol Appl. 2024 Dec 5;17(12):e70050. doi: 10.1111/eva.70050. eCollection 2024 Dec. Evol Appl. 2024. PMID: 39650626 Free PMC article.
References
-
- Blischak P.D., Kubatko L.S., Wolfe A.D.. 2018. SNP genotyping and parameter estimation in polyploids using low-coverage sequencing data. Bioinformatics 34:407–415. - PubMed
-
- Bragg J.G., Potter S., Bi K., Moritz C.. 2016. Exon capture phylogenomics: efficacy across scales of divergence. Mol. Ecol. Resour. 16:1059–1068. - PubMed
-
- Burress E., Alda F., Duarte A., Loureiro M., Armbruster J., Chakrabarty P.. 2018. Phylogenomics of pike cichlids (Cichlidae: Crenicichla): the rapid ecological speciation of an incipient species flock. J. Evol. Biol. 31:14–30. - PubMed
Publication types
MeSH terms
Associated data
LinkOut - more resources
Full Text Sources
Miscellaneous