Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Mar 20;15(1):2499.
doi: 10.1038/s41467-024-46659-0.

Strong positive selection biases identity-by-descent-based inferences of recent demography and population structure in Plasmodium falciparum

Affiliations

Strong positive selection biases identity-by-descent-based inferences of recent demography and population structure in Plasmodium falciparum

Bing Guo et al. Nat Commun. .

Abstract

Malaria genomic surveillance often estimates parasite genetic relatedness using metrics such as Identity-By-Decent (IBD), yet strong positive selection stemming from antimalarial drug resistance or other interventions may bias IBD-based estimates. In this study, we use simulations, a true IBD inference algorithm, and empirical data sets from different malaria transmission settings to investigate the extent of this bias and explore potential correction strategies. We analyze whole genome sequence data generated from 640 new and 3089 publicly available Plasmodium falciparum clinical isolates. We demonstrate that positive selection distorts IBD distributions, leading to underestimated effective population size and blurred population structure. Additionally, we discover that the removal of IBD peak regions partially restores the accuracy of IBD-based inferences, with this effect contingent on the population's background genetic relatedness and extent of inbreeding. Consequently, we advocate for selection correction for parasite populations undergoing strong, recent positive selection, particularly in high malaria transmission settings.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Summary of Pf parasite isolates and WGS data from SEA.
a Distribution of sampling location and collection year for the 2055 analyzable samples. The text and color in each block indicate the number of isolates sampled at a given year from a given location (also see colorbar). b Distribution of genome fractions covered by at least 5, 10, and 25 sequence reads of all analyzable parasite genomes from SEA. c Distribution of Fws in sequenced isolates that passed genotype missingness filtering. Note that to obtain a more accurate distribution of Fws, polyclonal isolates without a predominant clone were included in this analysis. d Distribution of ratios of predominant haploid genomes (clones) in analyzable SEA isolates. The predominant clone of a polyclonal infection was determined by dEploid,. Source data are provided as a Source data file.
Fig. 2
Fig. 2. Effects of positive selection on IBD distribution and Ne inference.
ac Positive selection affects various aspects of the IBD distribution, including IBD segment length (a), total IBD shared by a pair of isolates (b), and IBD location along the chromosome (c). Note: (1) the x-axis in (a) uses a custom scale for IBD length L (bottom) so that the estimated TMRCA (50/L, top) is in a linear scale; (2) for IBD segment length distribution analysis, shorter IBD segments (0.2–2 centimorgan (cM)) were included to cover the more distant past (>25 generations ago). Lines of transparent colors in (c) represent IBD coverage for different chromosomes for the same genome set; lines of solid colors show the average across chromosomes. The representative results were generated using a selection coefficient, s, of 0.3, a selection starting time 80 generations ago, and a single origin of the favored allele introduced at the position of 33.3 cM of each chromosome. d Strong positive selection causes underestimation of Ne compared to neutral simulation. The difference between selection (s = 0.3, red solid line) and neutral (blue solid line) scenarios can be partially mitigated by removing IBD segments (red dotted line) located within IBD peak regions. Parameter true population size (black dotted line) is plotted for reference. Error bands indicate 95% confidence intervals as determined by IBDNe. Abbreviations: Neutral, neutral simulation; Selection (Orig), positive selection with IBD peak regions not removed; Selection Rmpeaks, positive selection with IBD peak regions removed. Source data are provided as a Source data file. For results for different selection parameter values, see Supplementary Fig. 3.
Fig. 3
Fig. 3. Effects of positive selection on the IBD-based population structure inference.
a Schematic of the one-dimensional stepping-stone model with spreading selective sweeps. Five subpopulations (p1 to p5) were split from an ancestral population. There is symmetrical migration between adjacent subpopulations. A favored allele was introduced into the deme from one side of the chain and spread to the other side. b Frequency trajectory of favored alleles (average over chromosomes) in different subpopulations. c Heatmap of pairwise genome-wide total IBD under neutral, selection (s = 0.3, Selection Orig), and selection with IBD peaks removed (Selection Rmpeaks). Rows and columns are ordered by true population labels. d Normalized inter-population IBD sharing between nearby subpopulations. e Modularity of IBD networks with respect to the true population labels before and after removing IBD peaks. f IBD network InfoMap community detection before (left, and middle) and after (right) removing IBD peaks. For each subplot, rows are true subpopulations labeled as p1–p5 (assigned in simulation), and columns represent the largest 5 detected communities labeled as C0–C4 (with columns re-ordered to facilitate the comparison of true and inferred labels). The color of each block represents the number of genomes with the given true labels and detected community labels, with darker colors indicating a larger number of genomes. Source data are provided as a Source data file. For results for different selection coefficients and repeated simulations, see Supplementary Fig. 6 and Supplementary Table 1.
Fig. 4
Fig. 4. IBD coverage profile of all and unrelated Pf isolates in SEA.
a IBD coverage/proportions of all parasite genomes (n = 2055), including highly related and unrelated, in SEA. Labels on the top indicate the center of known or putative drug-resistance genes or genes involved in sexual commitment (*). b IBD coverage/proportions of unrelated genomes in SEA (n = 701). Annotations in (a) are shared with (b); regions with red shading indicate validated peaks (defined in “Methods”). Note: (1) different scales for y axes (IBD coverage on the left y-axis; IBD proportions on the right y-axis) were used in (a) versus (b) to better reveal the peaks; (2) the peaks around pph in (b) and ap2-g in (a) and (b) are IBD peak candidates that do not pass the peak validation step (see “Methods”). Source data are provided as a Source data file.
Fig. 5
Fig. 5. Ne and population structure inference in an empirical data set from SEA.
a Ne estimates for SEA before and after removing IBD peaks. Error bands indicate 95% confidence intervals as determined by IBDNe. b IBD network analysis of SEA data (before removing IBD peaks), including community-level IBD sharing matrix (heatmap), community size (blue circles below), and dendrogram showing hierarchical clustering of the community-level IBD matrix (left). Only the largest 5 communities, labeled as C0 to C4, are plotted. The rows and columns in the heatmap, each representing one of the 5 communities, are re-ordered such that heatmap and hierarchical clustering share the detected community labels (y axis tick labels). c Frequency of drug resistance mutations in different IBD communities. d Consistency of InfoMap assignment of unrelated isolates before (x-axis) and after (y-axis) removing the peaks. Source data are provided as a Source data file.
Fig. 6
Fig. 6. Removing IBD peaks changes the inference of Ne and population structure in the West African (WAF) data set.
a Ne estimates of Pf in the WAF data set before (blue) and after (red, dotted) removing IBD peaks. Error bands indicate 95% confidence intervals as determined by IBDNe. b, c Distribution of the sizes of detected communities from IBD networks using the WAF data set before (b) and after (c) removing IBD peaks. Only communities with at least 20 isolates are shown. The y axis indicates the number of isolates assigned to a detected community. The x-axis tick labels are the detected communities labeled as C0, …, C(n−1). Note that in (b) the leftmost red bar labeled C0 represents the dominant community in the original IBD network, with a size of 1222 isolates. Panel (c), on the other hand, shows the distribution after removing IBD peaks. This process leads to a reassignment of isolates from the dominant community (C0 in (b)) into smaller, distinct communities, labeled as C0-C14 in (c). To visually convey how community assignments have shifted as a result of this reassignment, each bar in (c) is split into two color components: red and gray. The red portion represents isolates that were part of the original dominant community (community C0 in b), while the gray portion indicates isolates that are not from this dominant community. Source data are provided as a Source data file.

Update of

References

    1. World Health Organization. World Malaria Report 2022 (World Health Organization, 2022).
    1. Ashley EA, et al. Spread of artemisinin resistance in Plasmodium falciparum malaria. N. Engl. J. Med. 2014;371:411–423. doi: 10.1056/NEJMoa1314981. - DOI - PMC - PubMed
    1. Hamilton WL, et al. Evolution and expansion of multidrug-resistant malaria in southeast Asia: a genomic epidemiology study. Lancet Infect. Dis. 2019;19:943–951. doi: 10.1016/S1473-3099(19)30392-5. - DOI - PMC - PubMed
    1. Packard RM. The origins of antimalarial-drug resistance. N. Engl. J. Med. 2014;371:397–9. doi: 10.1056/NEJMp1403340. - DOI - PubMed
    1. Imwong M, et al. The spread of artemisinin-resistant Plasmodium falciparum in the Greater Mekong subregion: a molecular epidemiology observational study. Lancet Infect. Dis. 2017;17:491–497. doi: 10.1016/S1473-3099(17)30048-8. - DOI - PMC - PubMed