Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2023 Mar 16:2023.03.13.23287179.
doi: 10.1101/2023.03.13.23287179.

Lineage-informative microhaplotypes for spatio-temporal surveillance of Plasmodium vivax malaria parasites

Affiliations

Lineage-informative microhaplotypes for spatio-temporal surveillance of Plasmodium vivax malaria parasites

Sasha V Siegel et al. medRxiv. .

Update in

Abstract

Challenges in understanding the origin of recurrent Plasmodium vivax infections constrains the surveillance of antimalarial efficacy and transmission of this neglected parasite. Recurrent infections within an individual may arise from activation of dormant liver stages (relapse), blood-stage treatment failure (recrudescence) or new inoculations (reinfection). Molecular inference of familial relatedness (identity-by-descent or IBD) based on whole genome sequence data, together with analysis of the intervals between parasitaemic episodes ("time-to-event" analysis), can help resolve the probable origin of recurrences. Whole genome sequencing of predominantly low-density P. vivax infections is challenging, so an accurate and scalable genotyping method to determine the origins of recurrent parasitaemia would be of significant benefit. We have developed a P. vivax genome-wide informatics pipeline to select specific microhaplotype panels that can capture IBD within small, amplifiable segments of the genome. Using a global set of 615 P. vivax genomes, we derived a panel of 100 microhaplotypes, each comprising 3-10 high frequency SNPs within <200 bp sequence windows. This panel exhibits high diversity in regions of the Asia-Pacific, Latin America and the horn of Africa (median HE = 0.70-0.81) and it captured 89% (273/307) of the polyclonal infections detected with genome-wide datasets. Using data simulations, we demonstrate lower error in estimating pairwise IBD using microhaplotypes, relative to traditional biallelic SNP barcodes. Our panel exhibited high accuracy in predicting the country of origin (median Matthew's correlation coefficient >0.9 in 90% countries tested) and it also captured local infection outbreak and bottlenecking events. The informatics pipeline is available open-source and yields microhaplotypes that can be readily transferred to high-throughput amplicon sequencing assays for surveillance in malaria-endemic regions.

Keywords: Plasmodium vivax; barcode; genotyping; identity-by-descent; malaria; microhaplotype; molecular surveillance; recurrence; relapse; relatedness.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.. Microhaplotype discovery pipeline.
Panel a) provides an overview of the marker selection process. Criteria for selecting samples, variants, and windows for potential microhaplotype candidate windows result in a total of 5,460 windows (200 bp). The MalariaGEN Pv4 dataset was filtered to use only high-quality monoclona samples (FWS ≥0.95) that had at least 50% of the core genome positions callable. SNP variants from this sample subset were then identified as biallelic, have low genotype missingness (<0.1), had high global minor allele frequencies (MAF ≥0.1), and FILTER = PASS in the MalariaGEN dataset, resulting in 13,090 total SNPs. The cogenome was then scanned in coding regions for all 200 bp windows in which > 1 of the identified variants were found and filtered for high diversity (global heterozygosity ≥0.5). Panel b) provides a schematic representation of microhaplotypes with 3 SNPs. Microhaplotypes leverage SNP information conte nt in small-windowed regions of the genome to provide a high-resolution reconstruction of the parasite genome. Three high-diversity SNPs in a single microhaplotype can have as many as 8 distinct combinations of alleles which, when combined with 100 microhaplotypes across the genome, results in high discriminatory power to characterize relatedness. Panel c) provides a map of the P. vivax heterozygome. Chromosomal distribution of all windows identified in the global set of high-quality, independent monoclonal infections with at least 1 SNP. Each point is an identified window, with the size increasing as the number of SNPs within the window increases. Potential microhaplotype regions are well distributed across the 14 chromosomes. The microhaplotypes with the highest SNP densities tend to be located at the ends of the chromosomes. Note, microhaplotypes were selected only from the accessible regions of the genome i.e., excluding highly diverse telomeric and sub-telomeric regions where sequence reads could not be mapped accurately. Panel d) illustrates the chromosome distribution of three panels evaluated for their capacity to reconstruct parasite relatedness. Two new microhaplotype panels were created from the microhaplotype discovery pipeline, named “Random mhap s” plotted in orange, and “High-diversity mhaps” plotted in purple. These two panels were selected to have 100 microhaplotype markers using windows that were well-spaced and had between 3-10 SNPS, with even distribution across all 14 chromosomes and a minimum diversity with heterozygosity ≥0.5. The markers for the Ra ndom mhaps panel were selected randomly, while the High-diversity mhaps panel were optimised to have the highest heterozygosity possible in each region. These two high-resolution panels were compared to the currently used biallelic 42-SNP panel, named “Broad” in green. Only 38 markers of this panel are considered informative and included in this representation.
Figure 2.
Figure 2.
Confidence intervals around relatedness estimates based on data simulated using various data-generating relatedness parameters, r. Data are presented on 3 marker panels: High-diversity microhaplotype panel, Random-SNP microhaplotype panel, and 38 Broad barcode biallelic SNPs. Separate plots are provided for each r and geographic region; AF (Africa), ESEA (East Southeast Asia), MSEA (Maritime Southeast Asia), OCE (Oceania), SAM (South America), WAS (West Asia) and WSEA (West Southeast Asia).
Figure 3.
Figure 3.. Comparative diversity between the High-diversity microhaplotype panel and the 38-SNP Broad Barcode.
Panel a) presents heterozygosity measures and panel b) presents effective cardinality scores in n=615 high-quality biologically independent monoclonal samples by panel and region. Panel labels; 38-SNP Broad barcode (BR38) and High-diversity microhaplotype panel (MHAP-3_10). Regional labels; Africa (AF), East Southeast Asia (ESEA), Maritime Southeast Asia (MSEA), Oceania (OCE), South America (SAM), West Asia (WAS) and West Southeast Asia (WSEA).
Figure 4.
Figure 4.. Genome-wide FWS distribution by microhaplotype-based complexity of infection (COI).
Data from n=922 high-quality biologically independent samples from Africa (AF), East Southeast Asia (ESEA), Maritime Southeast Asia (MSEA), Oceania (OCE), South America (SAM), West Asia (WAS) and West Southeast Asia (WSEA). Panel a) provides boxplots illustrating the distribution of genome wide FWS scores in each of the monoclonal and polyclonal infection subsets as determined by THEREALMcCOIL analysis of the SNPs in the 100 microhaplotypes using the proportional function. In all geographic regions, the median genome-wide FWS scores are closer to 1 (little to no within-host diversity) in the infections defined as monoclonal. Panel b) illustrates the correlation between genome-wide FWS and microhaplotype-based COI estimates; a trend of decreasing COI is observed with increasing FWS (i.e., decreasing within-host diversity).
Figure 5.
Figure 5.. Spatial trends in P. vivax connectivity using microhaplotypes.
Panels a) presents a PCoA plot presenting PC1 (46.4%) against PC2 (25.3%) in n= 615 high-quality biologically independent monoclonal isolates from Pv4.0 at the High-diversity SNP microhaplotype set. The combinations of PC1 and PC2 provides marked separation of all 7 regional groups. Panels b) and c) present whole genome sequencing (WGS) and microhaplotype-based IBD infection networks i n Malaysia. The network plots were generated at a set of 224,612 SNPs (WGS) and the High-diversity SNP microhaplotype panel in single clone Malaysian infections (n=57) at a connectivity threshold of minimum IBD 0.5 (siblings or greater relatedness). Infections are colour-coded according to sub-structure definitions based on previously described ADMIXTURE analysis with genomic data. Infections defined as “New” (purple) were not available in the previous analysis. The clustering patterns of the WGS and microhaplotype-based data are highly consistent; both data sets capture high connectivity amongst the K2 outbreak strains, a distinct K3 sub-population, and divergent K4 infections. The new infections appear to derive from across the different sub-populations.
Figure 6.
Figure 6.. Comparison of country prediction performance between SNP panels.
Comparisons were undertaken between the 494 SNPs in the High-diversity microhaplotype panel (MHAP-3_10), the 38-SNP Broad barcode (BR38), and the 33-, 50- and 55-SNP GEO panels (GEO33, GEO50 and GEO55 respectively). The boxplo present the Mathews correlation coefficient (MCC) scores from 500 repeats with stratified 10-fold cross validation for each SNP set using the Bi-Allele Likelihood classifier. Country labels are provided on the y-axis. Each bar presents the median, interquartile range and minimum and maximum MCC for the given country and model. The analyses were based on n =799 biologically independent samples from 21 countries (each with n ≥4 samples).

References

    1. WHO. World Malaria Report 2022. World Health Organization; Geneva: 2022. (2022).
    1. Auburn S., Cheng Q., Marfurt J. & Price R. N. The changing epidemiology of Plasmodium vivax: Insights from conventional and novel surveillance tools. PLoS medicine 18, e1003560, doi: 10.1371/journal.pmed.1003560 (2021). - DOI - PMC - PubMed
    1. Commons R. J., Simpson J. A., Watson J., White N. J. & Price R. N. Estimating the Proportion of Plasmodium vivax Recurrences Caused by Relapse: A Systematic Review and Meta-Analysis. The American journal of tropical medicine and hygiene 103, 1094–1099, doi: 10.4269/ajtmh.20-0186 (2020). - DOI - PMC - PubMed
    1. White N. J. Determinants of relapse periodicity in Plasmodium vivax malaria. Malaria journal 10, 297, doi: 10.1186/1475-2875-10-297 (2011). - DOI - PMC - PubMed
    1. Price R. N. et al. Global extent of chloroquine-resistant Plasmodium vivax: a systematic review and meta-analysis. The Lancet. Infectious diseases 14, 982–991, doi: 10.1016/S1473-3099(14)70855-2 (2014). - DOI - PMC - PubMed

Publication types