Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Aug;212(4):1337-1351.
doi: 10.1534/genetics.119.302120. Epub 2019 Jun 17.

Estimating Relatedness Between Malaria Parasites

Affiliations

Estimating Relatedness Between Malaria Parasites

Aimee R Taylor et al. Genetics. 2019 Aug.

Abstract

Understanding the relatedness of individuals within or between populations is a common goal in biology. Increasingly, relatedness features in genetic epidemiology studies of pathogens. These studies are relatively new compared to those in humans and other organisms, but are important for designing interventions and understanding pathogen transmission. Only recently have researchers begun to routinely apply relatedness to apicomplexan eukaryotic malaria parasites, and to date have used a range of different approaches on an ad hoc basis. Therefore, it remains unclear how to compare different studies and which measures to use. Here, we systematically compare measures based on identity-by-state (IBS) and identity-by-descent (IBD) using a globally diverse data set of malaria parasites, Plasmodium falciparum and P. vivax, and provide marker requirements for estimates based on IBD. We formally show that the informativeness of polyallelic markers for relatedness inference is maximized when alleles are equifrequent. Estimates based on IBS are sensitive to allele frequencies, which vary across populations and by experimental design. For portability across studies, we thus recommend estimates based on IBD. To generate estimates with errors below an arbitrary threshold of 0.1, we recommend ∼100 polyallelic or 200 biallelic markers. Marker requirements are immediately applicable to haploid malaria parasites and other haploid eukaryotes. C.I.s facilitate comparison when different marker sets are used. This is the first attempt to provide rigorous analysis of the reliability of, and requirements for, relatedness inference in malaria genetic epidemiology. We hope it will provide a basis for statistically informed prospective study design and surveillance strategies.

Keywords: Plasmodium falciparum; Plasmodium vivax; genetic epidemiology; hidden Markov model; identity-by-descent; identity-by-state; independence model; malaria; relatedness.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Models relating genetic data to genetic relatedness. Input data are depicted by green circles: for t=1,m, genotype calls, Yt(i) and Yt(j), and allele frequencies, (ft(g))gGt; and for t=2,m distances, dt. Parameters considered fixed (genotyping error, ϵ, and constant, ρ) are depicted by red circles. Unobserved quantities are depicted by gray squares: IBD states, IBD1,,IBDm, and estimands r and k. Solid arrows depict dependencies under both the independence model and the HMM. Dashed arrows depict dependencies under the HMM only. HMM, hidden Markov model; IBD, identity-by-descent; IBS, identity-by-state.
Figure 2
Figure 2
Minor allele frequency estimates from monoclonal P. falciparum data sets (Table 1). WGS, whole-genome sequencing.
Figure 3
Figure 3
Multiplicative increase in the precision of the maximum likelihood estimator with marker cardinality. The left plot shows the multiplicative increase for equifrequent alleles according to Equation 6. The right plot shows the multiplicative increase with Kt, where precision was calculated according to Equation 5 with either ft(gi)=1/Kti=1,,Kt (dots) or ft(g1)=1.75/Kt and ft(gi)=(1ft(g1))/(Kt1)i=2,,Kt such that Kt<Kt (triangles). FIM, Fisher information matrix.
Figure 4
Figure 4
Measures of relatedness: parasite pairs simulated with relatedness 0.5. Half-violin plots showing distributions of IBS^m (top) and r^m (bottom), each based on 1000 pairs simulated using r=0.5 and allele frequency estimates based on P. falciparum data sets with ≥ 59 SNPs (Table 1). To single out the effect of frequencies, we fixed all other parameters across the data sets including positions, which were extracted from the Western Kenyan data set. Allele frequencies were sampled uniformly at random from the full set of allele frequency estimates based on each data set. For each set of 59-SNP allele frequencies, the h¯m values were 0.86, 0.85, 0.73, 0.67, and 0.58 (top to bottom row of each plot, respectively). Black vertical bars denote h¯m+(1h¯m)r (top), and triangles denote the mean IBS^m (top) and mean r^m (bottom). IBS, identity-by-state; MS, microsatellite; WGS, whole-genome sequencing.
Figure 5
Figure 5
Measures of relatedness: parasite pairs with unknown relatedness. Half-violin plots showing distributions of IBS^m (top) and r^m (bottom), based on pairwise comparisons of Plasmodium monoclonal samples from six published P. falciparum biallelic SNP data sets (Table 1) and a single P. vivax MS data set (Thailand MS). Black vertical bars denote h¯mmax (top), and triangles denote the mean IBS^m (top) and mean r^m (bottom). IBS, identity-by-state; MS, microsatellite; WGS, whole-genome sequencing.
Figure 6
Figure 6
r^m with 95% C.I.s for 100 select pairwise comparisons per data set of monoclonal Plasmodium samples from P. falciparum data sets (Table 1) and a single P. vivax data set, Thai MS.
Figure 7
Figure 7
Coverage (panels A and B) and RMSE (panels C and D) under the HMM and the independence model. Coverage is equal to the proportion of 500 r^m whose 95% parametric bootstrap C.I.s contain the value of r used to simulate the data. Data were simulated under the HMM with ε=0.001, Kt=2 for all t, k = 8 for various r (panels A and C), and r = 0.5 for various k (panels B and D). HMM, hidden Markov model; MS, microsatellite; RMSE, root mean squared error.
Figure 8
Figure 8
RMSE of r^m generated under the HMM. Data were simulated under the HMM using various r (see legend); allele frequencies drawn from the WGS data set with probability proportional to their MAFs (h¯m0.69 and K¯m1.53, left plot) and uniformly at random (h¯m0.89 and K¯m1.17, right plot) (values are approximate due to some variation across m). HMM, hidden Markov model; MAF, minor allele frequency; RMSE, root mean squared error.
Figure 9
Figure 9
Root mean squared error of r^m around data generating r=0.5 as a function of m for various K¯mcum (panel A) and as function m × K¯mcum (panel B).

Similar articles

Cited by

References

    1. Anderson E. C., Garza J. C., 2006. The power of single-nucleotide polymorphisms for large-scale parentage inference. Genetics 172: 2567–2582. 10.1534/genetics.105.048074 - DOI - PMC - PubMed
    1. Anderson T. J. C., Haubold B., Williams J. T., Estrada-Franco J. G., Richardson L., et al. , 2000. Microsatellite markers reveal a spectrum of population structures in the malaria parasite Plasmodium falciparum. Mol. Biol. Evol. 17: 1467–1482. 10.1093/oxfordjournals.molbev.a026247 - DOI - PubMed
    1. Anderson T. J. C., Williams J. T., Nair S., Sudimack D., Barends M., et al. , 2010. Inferred relatedness and heritability in malaria parasites. Proc. R. Soc. Lond. B Biol. Sci. 277: 2531–2540. 10.1098/rspb.2010.0196 - DOI - PMC - PubMed
    1. Aydemir O., Janko M., Hathaway N. J., Verity R., Mwandagalirwa M. K., et al. , 2018. Drug-resistance and population structure of plasmodium falciparum across the democratic Republic of Congo using high-throughput molecular inversion probes. J. Infect. Dis. 218: 946–955. 10.1093/infdis/jiy223 - DOI - PMC - PubMed
    1. Baetscher D. S., Clemento A. J., Ng T. C., Anderson E. C., Garza J. C., et al. , 2018. Microhaplotypes provide increased power from short-read DNA sequences for relationship inference. Mol. Ecol. Resour. 18: 296–305. 10.1111/1755-0998.12737 - DOI - PubMed

Publication types