VCF2Dis: an ultra-fast and efficient tool to calculate pairwise genetic distance and construct population phylogeny from VCF files
- PMID: 40184433
- PMCID: PMC11970368
- DOI: 10.1093/gigascience/giaf032
VCF2Dis: an ultra-fast and efficient tool to calculate pairwise genetic distance and construct population phylogeny from VCF files
Abstract
Background: Genetic distance metrics are crucial for understanding the evolutionary relationships and population structure of organisms. Progress in next-generation sequencing technology has given rise of genotyping data of thousands of individuals. The standard Variant Call Format (VCF) is widely used to store genomic variation information, but calculating genetic distance and constructing population phylogeny directly from large VCF files can be challenging. Moreover, the existing tools that implement such functions remain limited and have low performance in processing large-scale genotype data, especially in the area of memory efficiency.
Findings: To address these challenges, we introduce VCF2Dis, an ultra-fast and efficient tool that calculates pairwise genetic distance directly from large VCF files and then constructs distance-based population phylogeny using the ape package. Benchmarking results demonstrate the tool's efficiency, with rapid processing times, minimal memory usage (e.g., 0.37 GB for the complete analysis of 2,504 samples with 81.2 million variants), and high accuracy, even when handling datasets with millions of variants from thousands of individuals. Its straightforward command-line interface, compatibility with downstream phylogenetic analysis tools (e.g., MEGA, Phylip, and FastTree), and support for multithreading make it a valuable tool for researchers studying population relationships. These advantages meaning VCF2Dis has already been widely utilized in many published genomic studies.
Conclusion: We present VCF2Dis, a straightforward and efficient tool for calculating genetic distance and constructing population phylogeny directly from large-scale genotype data. VCF2Dis has been widely applied, facilitating the exploration of population relationship in extensive genome sequencing studies.
Keywords: VCF; VCF2Dis; p-distance; population phylogeny.
© The Author(s) 2025. Published by Oxford University Press GigaScience.
Conflict of interest statement
The authors declare no potential competing interests.
Figures


Similar articles
-
GSC: efficient lossless compression of VCF files with fast query.Gigascience. 2024 Jan 2;13:giae046. doi: 10.1093/gigascience/giae046. Gigascience. 2024. PMID: 39028587 Free PMC article.
-
Variant Tool Chest: an improved tool to analyze and manipulate variant call format (VCF) files.BMC Bioinformatics. 2014;15 Suppl 7(Suppl 7):S12. doi: 10.1186/1471-2105-15-S7-S12. Epub 2014 May 28. BMC Bioinformatics. 2014. PMID: 25080132 Free PMC article.
-
VCF-Explorer: filtering and analysing whole genome VCF files.Bioinformatics. 2017 Nov 1;33(21):3468-3470. doi: 10.1093/bioinformatics/btx422. Bioinformatics. 2017. PMID: 29036499
-
Advances in Whole Genome Sequencing: Methods, Tools, and Applications in Population Genomics.Int J Mol Sci. 2025 Jan 4;26(1):372. doi: 10.3390/ijms26010372. Int J Mol Sci. 2025. PMID: 39796227 Free PMC article. Review.
-
Molecular epidemiology, phylogeny and evolution of the filarial nematode Wuchereria bancrofti.Infect Genet Evol. 2014 Dec;28:33-43. doi: 10.1016/j.meegid.2014.08.018. Epub 2014 Aug 29. Infect Genet Evol. 2014. PMID: 25176600 Free PMC article. Review.
Cited by
-
Chromosome-Level Assemblies of Three Candidatus Liberibacter solanacearum Vectors: Dyspersa apicalis (Förster, 1848), Dyspersa pallida (Burckhardt, 1986), and Trioza urticae (Linnaeus, 1758) (Hemiptera: Psylloidea).Genome Biol Evol. 2025 May 30;17(6):evaf116. doi: 10.1093/gbe/evaf116. Genome Biol Evol. 2025. PMID: 40468976 Free PMC article.
-
Genome-wide Parallelism Underlies Rapid Freshwater Adaptation Fueled by Standing Genetic Variation in a Wild Fish.Mol Biol Evol. 2025 Jul 1;42(7):msaf160. doi: 10.1093/molbev/msaf160. Mol Biol Evol. 2025. PMID: 40609046 Free PMC article.
-
Whole genome resequencing uncovers candidate genes related to plumage color in Yuexi frizzled feather chicken.Poult Sci. 2025 Aug 13;104(11):105680. doi: 10.1016/j.psj.2025.105680. Online ahead of print. Poult Sci. 2025. PMID: 40840286 Free PMC article.
References
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Research Materials
Miscellaneous