Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 10;16(1):6374.
doi: 10.1038/s41467-025-61687-0.

Global diversity of soil-transmitted helminths reveals population-biased genetic variation that impacts diagnostic targets

Affiliations

Global diversity of soil-transmitted helminths reveals population-biased genetic variation that impacts diagnostic targets

Marina Papaiakovou et al. Nat Commun. .

Abstract

Soil-transmitted helminths (STHs) are intestinal parasites that affect over a billion people worldwide. STH control relies on microscopy-based diagnostics to monitor parasite prevalence and enable post-treatment surveillance; however, molecular diagnostics are rapidly being developed due to increased sensitivity, particularly in low-STH-prevalence settings. The genetic diversity of helminths and its potential impact on molecular diagnostics remain unclear. Using low-coverage genome sequencing, we assess the genetics of STHs within worm, faecal, and purified egg samples from 27 countries, identifying differences in the genetic connectivity and diversity of STH-positive samples across regions and cryptic diversity between closely related human- and pig-infective species. We define substantial copy number and sequence variants in current diagnostic target regions and validate the impact of genetic variation on qPCR diagnostics using in vitro assays. Our study provides insights into the diversity and genomic epidemiology of STHs, highlighting both the challenges and opportunities for developing molecular diagnostics needed to support STH control efforts.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests

Figures

Fig. 1
Fig. 1. The geographic distribution of gastrointestinal helminths detected by sequencing among worm or egg isolates and faecal samples analysed.
World maps show the countries of origin for all samples used in the study, which include (a) faecal samples (n = 842) from 15 countries and (b) worm and concentrated egg isolates (n = 158) from 15 countries. The point size on the maps indicates the number of samples from each location. Heatmaps display the relative prevalence of single and mixed-species infections of parasites detected by sequencing from (c) faecal samples and (d) worm/concentrated egg isolates, respectively. The faecal sample icon indicates faecal samples and adult worm/egg figures indicate samples from adult worms and/or concentrated worm eggs. A minimum normalised coverage threshold of 10 reads was applied, resulting in 329 samples from 27 countries reported as positive by sequencing for at least one parasite. Genomic datasets from faecal samples were plotted separately from the worm or egg datasets using a different scale based on the obtained coverage. Colours reflect read counts, normalised by the total number of reads per sample per genome size, to achieve ‘reads mapped per million reads per Mb’. Sample site abbreviations for both faecal and worm data are as follows: ARG = Argentina; BEN = Benin; BGD = Bangladesh; CHN = China; CMR = Cameroon; DRC = Democratic Republic of the Congo; ECU = Ecuador; ETH = Ethiopia; FJI = Fiji; GLP = Guadeloupe; HND = Honduras; IND = India; KEN = Kenya; LKA = Sri Lanka; MMR = Myanmar; MOZ = Mozambique; MWI = Malawi; MYS = Malaysia; NGA = Nigeria; PRI = Puerto Rico; SEN = Senegal; THA = Thailand; TZA = Tanzania; UGA = Uganda; USA = United States of America; ZAF = South Africa. Faecal, worm, and egg icons are provided by Servier Medical Art (https://smart.servier.com/), licensed under CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/). Source data are provided as a Source Data file.
Fig. 2
Fig. 2. Comparison of genetic diversity within and between populations of Trichuris trichiura and Ascaris spp.
a Relative genetic relationships between populations for pooled samples (i.e., eggs) based on pairwise estimates of mitochondrial genome diversity (DXY). Colour indication: blue = Trichuris trichiura, red = Ascaris spp. The thickness of the line connecting countries reflects the degree of similarity between mitochondrial genomes; thicker lines represent more substantial genetic similarity, while thinner lines represent weaker genetic similarity. The mean DXY per country combination was calculated, and the line size was plotted as ‘1- mean DXY per country combination’. b Comparison of nucleotide diversity (π) in mitochondrial genomes amongst individual worms (left) and pools of eggs from faecal samples (right) (n = 16 for Ascaris spp., n = 3 for T. trichiura) per population. c Comparison of the variant frequencies as SNPs in the individuals’ populations (n = 28 samples for T. trichiura, n = 65 for Ascaris spp.) and alleles in the pools (n = 16 for Ascaris spp., n = 3 for T. trichiura). The frequencies of SNPs and alleles were calculated to facilitate comparison of genetic variation among both individual worms and pools of eggs. For Ecuador, calculating nucleotide diversity was infeasible due to the availability of only a single sample. Adult worm and egg icons represent single adult worm and concentrated worm egg data, respectively. Country codes are as follows: CMR = Cameroon; MMR = Myanmar; MOZ = Mozambique; ZAF = South Africa. Faecal, worm, and egg icons are provided by Servier Medical Art (https://smart.servier.com/), licensed under CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/). Source data are provided as a Source Data file.
Fig. 3
Fig. 3. Genomic diversity and differential coverage of repeat classes used as diagnostic targets.
a, b, c, Repeat diversity was determined by searching the canonical repeat units using nucmer with 90% nucleotide similarity and 90% sequence coverage against the genome assemblies of (a) Ascaris lumbricoides, (b) Necator americanus and (c) Trichuris trichiura as a reference. In a, b, and c, the heatmaps show the pairwise distance calculated as the sum of squares of a nucleotide similarity matrix derived from ClustalOmega-aligned repeat sequences for each species, where lighter colour (white) on the colour scale reflect a stronger degree of similarity between two sequences. In d, e, and f, Genome coverage per repeat per country was determined by bedtools multicov (with minimum overlap 0.51) in merged-by-country BAM files (filtered raw reads > ten reads) against the genome assemblies of (d) A. lumbricoides, (e) N. americanus, and (f) T. trichiura. Coverage is expressed as ‘repeat copies,’ calculated by dividing the original repeat coverage by the mean per-country single copy exon coverage. The central box represents the interquartile range, and the whiskers represent the data’s first and third quartiles. The median is shown as a line through the centre of the box. The whiskers extend from the edges of the box to the smallest and largest values within 1.5 times the interquartile range (IQR) from Q1 and Q3, respectively. Only repeats containing both forward and reverse primers and probe binding sites were included. Source data are provided as a Source Data file.
Fig. 4
Fig. 4. Presence, distribution, and impact of genetic variation within diagnostic qPCR targets of Ascaris lumbricoides.
a Shown are the genomic coordinates of three repeats - repeat 1, 2, and 3 - highlighting primer and probe binding sites (solid rectangles) used in the qPCR diagnostic test, the position of genetic variants (x-axis) - either individual samples (black points) or mean across samples (pink points) - and their frequency within each country (y-axis). Putative qPCR-disruptive variants found at the 3’ end of the primer binding sites are depicted by dashed rectangles. b qPCR efficiencies were determined by generating standard curves of five serial dilutions (100 pg/μl to 10 fg/μl) on each of the repeats in the absence (wildtype; wt) or presence of the SNP (mutant; mut). The mean value of three replicates per concentration is shown. The 90-110% dashed lines show acceptable qPCR efficiency cutoffs. c The mean fold-loss was calculated to assess the effect of the SNP in qPCR quantitation and product loss due to late amplification. The mean fold loss of the mutant is relative to the wildtype repeat, within each assay. The mean normalised Cq difference was estimated from all serial dilutions. Country codes are as follows: BEN = Benin; ETH = Ethiopia; KEN = Kenya; MMR = Myanmar; MOZ = Mozambique; MYS = Malaysia; ZAF = South Africa. Source data are provided as a Source Data file.

References

    1. Bethony, J. et al. Soil-transmitted helminth infections: ascariasis, trichuriasis, and hookworm. Lancet367, 1521–1532 (2006). - PubMed
    1. Else, K. J. et al. Whipworm and roundworm infections. Nat. Rev. Dis. Prim.6, 44 (2020). - PubMed
    1. Tenorio, J. C. B. et al. Ancylostoma ceylanicum and other zoonotic canine hookworms: neglected public and animal health risks in the Asia–Pacific region. Anim. Dis. 4, 11 (2024).
    1. Gordon, C. A. et al. Strongyloidiasis. Nat. Rev. Dis. Prim.10, 6 (2024). - PubMed
    1. Campbell, S. J. et al. Complexities and perplexities: a critical appraisal of the evidence for soil-transmitted helminth infection-related morbidity. PLoS Negl. Trop. Dis.10, e0004566 (2016). - PMC - PubMed

LinkOut - more resources