Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2023 Jul 15:2023.07.14.549114.
doi: 10.1101/2023.07.14.549114.

Strong Positive Selection Biases Identity-By-Descent-Based Inferences of Recent Demography and Population Structure in Plasmodium falciparum

Affiliations

Strong Positive Selection Biases Identity-By-Descent-Based Inferences of Recent Demography and Population Structure in Plasmodium falciparum

Bing Guo et al. bioRxiv. .

Update in

Abstract

Malaria genomic surveillance often estimates parasite genetic relatedness using metrics such as Identity-By-Decent (IBD). Yet, strong positive selection stemming from antimalarial drug resistance or other interventions may bias IBD-based estimates. In this study, we utilized simulations, a true IBD inference algorithm, and empirical datasets from different malaria transmission settings to investigate the extent of such bias and explore potential correction strategies. We analyzed whole genome sequence data generated from 640 new and 4,026 publicly available Plasmodium falciparum clinical isolates. Our findings demonstrated that positive selection distorts IBD distributions, leading to underestimated effective population size and blurred population structure. Additionally, we discovered that the removal of IBD peak regions partially restored the accuracy of IBD-based inferences, with this effect contingent on the population's background genetic relatedness. Consequently, we advocate for selection correction for parasite populations undergoing strong, recent positive selection, particularly in high malaria transmission settings.

Keywords: Effective population size; Genetic relatedness; Identity-By-Descent; Malaria; Population structure; Positive selection.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.. Summary of Pf parasite isolates and WGS data for Pf from SEA.
a, Distribution of sampling location and collection year for the 2,055 analyzable samples. b, Distribution of genome fractions covered by at least 5, 10, and 25 sequence reads of all parasite genomes from SEA. c, Distribution of Fws in sequenced isolates that passed quality control (genotype missingness filtering). d, Distribution of ratios of predominant genomes in sequenced isolates that passed quality control (determined by dEploid,).
Figure 2.
Figure 2.. Effects of positive selection on IBD distribution and Ne inference.
a-c, Positive selection affects various aspects of the IBD distribution, including IBD segment length (a), total IBD shared by a pair of isolates (b), and IBD location along the chromosome (c). Note that x-axis in a uses a custom scale for IBD length l (bottom) so that the estimated TMRCA (50/l, top) is in a linear scale. Shorter IBD segments (0.2–2 cM) were included to cover the more distant past (>25 generations ago). Lines of transparent colors in c represent IBD coverage for different chromosomes for the same genome set; lines of solid colors show average across chromosomes. The representative results were generated using a selection coefficient, s, of 0.3, a selection starting time 80 generations ago, and a single origin of the favored allele introduced at the position of 33.3 cM of each chromosome. Abbreviation: Neu, Neutral; Sel Orig, positive selection (IBD peak region not removed); Sel Rmpeaks, positive selection with IBD peak regions removed. d, Strong positive selection causes underestimation of Ne compared to neutral simulation. The difference between selection (s = 0.3) and neutral scenarios can be partially mitigated by removing IBD segments located within IBD peak regions. For results for different selection parameter values, see Fig. S3.
Figure 3.
Figure 3.. Effects of positive selection on the IBD-based population structure inference.
a, Schematic of the one-dimensional stepping-stone model. Five subpopulations were split from an ancestral population. There is symmetrical migration between adjacent subpopulations. A favored allele was introduced into the deme from one side of the chain and spread to the other side. b, Average frequency trajectory of favored alleles (on each chromosome) in different subpopulations (p1 - p5). c, Heatmap of pairwise genome-wide total IBD under neutral, selection (s = 0.3), and selection with peaks removed. Rows and columns are ordered by true population labels. d, Normalized inter-population IBD sharing between nearby demes. e. Modularity of IBD networks with respect to the true population labels under IBD processing conditions (before and after removing IBD peaks). f. IBD network InfoMap community detection before (left, and middle) and after (right) removing IBD peaks. For each subplot, columns are detected community labels and rows are true population labels. The color of each block represents the number of genomes with the given true and detected communities. For results for different selection coefficients, see Fig. S6.
Figure 4.
Figure 4.. IBD coverage profile of all and unrelated Pf isolates in SEA.
a. IBD coverage of all parasite genomes in SEA. Labels on the top indicate the center of known or putative drug resistance genes or genes that are under selection for sexual commitment (*). b. IBD coverage of unrelated genomes in SEA. Annotations in a are shared with b. Note that different scales for y axes (IBD proportions) were used to better reveal the peaks.
Figure 5.
Figure 5.. Ne and population structure inference in an empirical dataset from SEA.
a, Ne estimates for SEA before and after removing IBD peaks. b, IBD network analysis of SEA data (before removing IBD peaks), including community-level IBD sharing matrix (heatmap), community size (blue circles below), and dendrogram showing hierarchical clustering of community-level IBD matrix (left). Only the largest 5 communities are plotted. c, Frequency of drug resistance mutations in different IBD communities. d, Consistency of InfoMap assignment of unrelated isolates before (x-axis) and after (y-axis) removing the peaks.
Figure 6.
Figure 6.. Removing IBD peaks changes the inference of Ne and population structure in the West African dataset.
a, Ne estimates of Pf population in a WAF dataset before (blue) and after (red, dotted) removing peaks. b-c, IBD network community detection using WAF dataset before (b) and after (c) removing IBD peaks. Only communities with at least 20 isolates are shown. Removing IBD peaks allows samples of the dominant community in the original inference (b, the leftmost red bar) to be split into smaller communities in (c, red portions of many bars).

References

    1. World Health Organization. World malaria report 2022. (World Health Organization, 2022).
    1. Ashley E. A. et al. Spread of Artemisinin Resistance in Plasmodium falciparum Malaria. N. Engl. J. Med. 371, 411–423 (2014). - PMC - PubMed
    1. Hamilton W. L. et al. Evolution and expansion of multidrug-resistant malaria in southeast Asia: A genomic epidemiology study. Lancet Infect. Dis. 19, 943–951 (2019). - PMC - PubMed
    1. Packard R. M. The origins of antimalarial-drug resistance. N. Engl. J. Med. 371, 397–9 (2014). - PubMed
    1. Imwong M. et al. The spread of artemisinin-resistant Plasmodium falciparum in the Greater Mekong subregion: A molecular epidemiology observational study. Lancet Infect. Dis. 17, 491–497 (2017). - PMC - PubMed

Publication types