Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jul 6:10:e69698.
doi: 10.7554/eLife.69698.

Improving statistical power in severe malaria genetic association studies by augmenting phenotypic precision

Affiliations

Improving statistical power in severe malaria genetic association studies by augmenting phenotypic precision

James A Watson et al. Elife. .

Abstract

Severe falciparum malaria has substantially affected human evolution. Genetic association studies of patients with clinically defined severe malaria and matched population controls have helped characterise human genetic susceptibility to severe malaria, but phenotypic imprecision compromises discovered associations. In areas of high malaria transmission, the diagnosis of severe malaria in young children and, in particular, the distinction from bacterial sepsis are imprecise. We developed a probabilistic diagnostic model of severe malaria using platelet and white count data. Under this model, we re-analysed clinical and genetic data from 2220 Kenyan children with clinically defined severe malaria and 3940 population controls, adjusting for phenotype mis-labelling. Our model, validated by the distribution of sickle trait, estimated that approximately one-third of cases did not have severe malaria. We propose a data-tilting approach for case-control studies with phenotype mis-labelling and show that this reduces false discovery rates and improves statistical power in genome-wide association studies.

Keywords: GWAS; complete blood count; diagnosis; epidemiology; genetics; genomics; global health; human; severe malaria.

Plain language summary

In areas of sub-Saharan Africa where malaria is common, most people are frequently exposed to the bites of mosquitoes carrying malaria parasites, so they often have malaria parasites in their blood. Young children, who have not yet built up strong immunity against malaria, often fall ill with severe malaria, a life-threatening disease. It is unclear why some children develop severe malaria and die, while other children with high numbers of parasites in their blood do not develop any apparent symptoms. Genetic susceptibility studies are designed to uncover why such differences exist by comparing individuals with severe malaria (referred to as ‘cases’) with individuals drawn from the general population (known as ‘controls’). But severe malaria can be a challenge to diagnose. Since high numbers of malaria parasites can be found in healthy children, it is sometimes difficult to determine whether the parasites are making a child ill, or whether they are a coincidental finding. Consequently, some of the ‘cases’ recruited into these studies may actually have a different disease, such as bacterial sepsis. This ultimately affects how the studies are interpreted, and introduces error and inaccuracy into the data. Watson, Ndila et al. investigated whether measuring blood biomarkers in patients (derived from the complete blood count, including platelet counts and white blood cell counts) could improve the accuracy with which malaria is diagnosed. They developed a new mathematical model that incorporates platelet and white blood cell counts. This model estimates that in a large cohort of 2,220 Kenyan children diagnosed with severe malaria, around one third of enrolled children did not actually have this disease. Further analysis suggests that patients with severe malaria are highly unlikely to have platelet counts higher than 200,000 per microlitre. This defines a cut-off that researchers can use to avoid recruiting patients who do not have severe malaria in future studies. Additionally, the ability to diagnose severe malaria more accurately can make it easier to detect and treat other diseases with similar symptoms in children with high numbers of malaria parasites in their blood. Watson, Ndila et al.’s findings support the recommendation that all children with suspected malaria be given broad spectrum antibiotics, as many misdiagnosed children will likely have bacterial sepsis. It also suggests that using complete blood counts, which are cheap to obtain and increasingly available in low-resource settings, could improve diagnostic accuracy in future clinical studies of severe malaria. This could ultimately improve the ability of these studies to find new treatments for this life-threatening disease.

PubMed Disclaimer

Conflict of interest statement

JW, CN, SU, AM, GN, SM, CN, NM, NP, BT, KR, SL, HK, EG, KM, ND, AD, PB, TW, CH, NW No competing interests declared

Figures

Figure 1.
Figure 1.. Platelet counts and white blood cell counts as diagnostic predictors of severe falciparum malaria.
Panel (A) shows the bivariate marginal distribution for the reference data (thought to be highly specific to severe malaria, green triangles, n = 1704, summarised in Table 1) and for the Kenyan case data (pink squares, n = 2220; black diamonds: HbAS). The dashed ellipses show the 50% and 95% bivariate normal probability contours approximating each dataset (dark green: reference data; purple: Kenyan data). Panel (B) shows the relationship between platelet counts and plasma PfHRP2 in adults with severe malaria from Bangladesh (green circles, n = 172, the dashed green line shows a linear fit) and in children enrolled in the FEAST trial (n = 567, not specific to severe malaria, Maitland et al., 2011). Undetectable plasma PfHRP2 concentrations were set to 1 ng/mL ± random jitter. Orange squares: malaria-positive blood slide; black triangles: malaria-negative blood slide. The brown line shows a spline fit to the FEAST data (smooth.spline function in R with default parameters) including the data points where PfHRP2 was below the lower limit of detection.
Figure 2.
Figure 2.. Theoretical causal pathways that lead to the clinical diagnosis of severe malaria under the current WHO definition (World Health Organisation, 2014).
Pathways (a) and (b) represent the two ways patients can be mis-classified as severe malaria. For both pathways (a) and (b), we expect a higher prevalence of HbAS relative to the population with true severe malaria as a consequence of the protective bottlenecks. In this causal model, we assume that HbAS does not protect against asymptomatic parasitaemia, although this assumption is not strictly necessary. Adapted with permission from Small et al., 2017.
Figure 3.
Figure 3.. Model estimates of P(Severe malaria | Data) in 2220 Kenyan children clinically diagnosed with severe malaria.
Panel (A) shows the distribution of posterior probabilities of severe malaria being the correct diagnosis. Panel (B) shows these same probabilities plotted as a function of the platelet and white counts on which they are based (dark red: probability close to 0; dark blue: probability close to 1). The black diamonds show the HbAS individuals. Panels (C–E) show the relationship between the estimated probabilities of severe malaria and HbAS, in-hospital mortality and admission parasite density, respectively. The black lines (shaded areas) show the mean estimated values (95% confidence intervals) from a generalised additive logistic regression model with a smooth spline term for the likelihood (R package mgcv). The horizontal lines in panels (CE) show the mean values in the data.
Figure 4.
Figure 4.. The number of significant hits as a function of the FDR for the genome-wide association study across 9.6 million biallelic variants.
This analysis is based on a subset of the Kenyan children with whole-genome data available and passing quality checks n = 1297 and n = 1614 controls. Dashed line: weighted model; thick line: non-weighted model.
Figure 5.
Figure 5.. The three regions in the human genome with the greatest evidence for protection against severe malaria in East Africa (HBB, ABO and FREM3; Band et al., 2019).
The Manhattan plots (left panels) compare p-values from the weighted model (blue) and the non-weighted model (orange). Each Manhattan plot is centred around the known causal position shown by the vertical dashed line (0.5 Mb region). The horizontal dashed line shows p=10-7 (threshold often used for defining genome-wide significance). The 10 positions with the greatest –log10 p-values under the non-weighted model are shown as large diamonds. The scatter plots on the right compare absolute effect size estimates under both models with the same top 10 hits shown by the larger purple diamonds. Increases of 30, 9 and 5% are seen for the 10 top hits for HBB, ABO and FREM3, respectively.
Figure 6.
Figure 6.. Exploring differential effects in 120 directly typed polymorphisms across 70 candidate malaria-protecting genes.
(A) Case-control effect sizes estimated for the ‘severe malaria’ sub-population versus the ‘not severe malaria’ sub-population (n = 3940 controls and n = 2220 cases, with approximately 1279 in the ‘severe malaria’ sub-population and 941 in the ‘not severe malaria’ sub-population). The vertical and horizontal grey lines show the 95% credible intervals. (B) The log10 p-values testing the hypothesis that the effects are the same for the two sub-populations relative to controls. The top dashed line shows the Bonferroni corrected α=0.05 significance threshold (assuming 70 independent tests). The bottom dashed line shows the nominal α=0.05 significance threshold. In both panels, red circles denote p<0.05 (nominal significance level), and red squares denote p<0.05/70. (C) Analysis of the rs1050828 SNP (encoding G6PD + 202T) under a non-additive model (hemi/homozygotes and heterozygotes are distinct categories). This shows that heterozygotes are clearly under-represented in the ‘severe malaria’ sub-population and hemi/homozygotes are clearly over-represented in the ‘not severe malaria’ sub-population. (D) Evidence of differential effects for the O blood group (rs8176719, recessive model) and FREM3 (additive model).
Appendix 1—figure 1.
Appendix 1—figure 1.. Comparison of the marginal distributions of white blood cell counts between Asian adults and children with severe malaria and African children with severe malaria.
FEAST: 121 severely ill Ugandan children with PfHRP2 >1000 ng/mL (Maitland et al., 2011). Vietnamese adults: 930 adults from two large randomised trials in severe malaria (Phu et al., 2010; Hien et al., 1996). Bangladesh/Thailand: 653 adults and children from observational studies of severe malaria (Leopold et al., 2019).
Appendix 1—figure 2.
Appendix 1—figure 2.. Comparison of the marginal distributions of platelet counts between Asian adults and children with severe malaria and African children with severe malaria.
FEAST: 121 severely ill Ugandan children with PfHRP2 >1000 ng/mL (Maitland et al., 2011). Vietnamese adults: 930 adults from two large randomised trials in severe malaria (Phu et al., 2010; Hien et al., 1996). Bangladesh/Thailand: 653 adults and children from observational studies of severe malaria (Leopold et al., 2019). The bottom-left qqplot compares the white counts from the children in the FEAST study with the combined dataset from Vietnam and Bangladesh/Thailand.
Appendix 2—figure 1.
Appendix 2—figure 1.. The relationship between platelet counts and plasma PfHRP2 in severely ill African children.
The black line (shaded area) shows the estimated probability (95% confidence interval) that the plasma PfHRP2 >1000 ng/mL as a function of log10 platelet count. This fit is derived from a generalised additive logistic regression model (p<10-16 for the spline term), fit using the R package mgcv. The generalised additive model was fit to data from 566 African children enrolled in the FEAST trial (Maitland et al., 2011) (all the children who had both platelet counts and PfHRP2 data available). Plasma PfHRP2 >1000 ng/mL is highly discriminatory for severe malaria (Hendriksen et al., 2012).
Appendix 3—figure 1.
Appendix 3—figure 1.. Effect of permuting the weights in the re-weighted (data-tilting) GWAS.
Here we show the results of 20 random permutations of the weights, applied to the Kenyan case-control GWAS using only chromosomes 4, 9 and 11 (where the top hits are – we limit it to these three chromosomes for computational reasons). The random permutations (grey) decrease the number of significant hits compared to the non-weighted (thick black) and the non-permuted re-weighted model (dashed purple).
Appendix 4—figure 1.
Appendix 4—figure 1.. Comparison of the non-weighted and weighted models of association for directly typed polymorphisms previously reported as associated with severe malaria (MalariaGEN Consortium et al., 2018).
(A) Estimated effect sizes under the non-weighted model versus the difference in effect sizes between the weighted and non-weighted models (absolute effects on the log-odds scale). Differences > 0 imply that the absolute effect size is estimated to be larger under the weighted model. (B) –log10 p-values under the non-weighted model versus the differences in –log10 p-values under the weighted and non-weighted models, again differences > 0 represent larger –log10 p-values for the weighted model. Each point is represented by the gene name. In each case, we use the model that best fit the data in the original analysis (MalariaGEN Consortium et al., 2018). For the X-linked polymorphisms (G6PD, CD40LG), multiple models were reported and so the association model is also shown. H: heterozygote; A: additive; M: males only; F: females only; M/F: all.
Appendix 5—figure 1.
Appendix 5—figure 1.. Case-only analysis of five key polymorphisms effecting red cells, reported in Ndila et al., 2020 under additive, recessive or heterozygote models.
The horizontal dashed lines show the estimated frequency in the controls (for additive models, this is the frequency of the derived allele; for the heterozygote or recessive models, this is the frequency of the genotype thought to confer protection). The line (shaded area) shows logistic regression fits with P(Severe malaria | Data) as the predictor (95% confidence interval of the fit). The p-value corresponds to the test that the predictor P(Severe malaria | Data) is not associated with the genotype in the cases only. OBG: O blood group.
Appendix 6—figure 1.
Appendix 6—figure 1.. Distribution of admission haemoglobin concentrations as a function of P(Severe malaria | Data).
Severe anaemia is generally defined as a haemoglobin less than 5 g/dL in African children diagnosed with severe malaria, shown by the horizontal dashed red line in the top panel and the vertical dashed red lines in the bottom panels. The vertical dashed red lines in the top panel show the top and bottom quintiles of the probability distribution (0.9 and 0.2, respectively). Patients in the bottom quintile of the probability distribution had a markedly bimodal distribution in haemoglobin concentrations with a substantial proportion meeting the severe anaemia criterion and a substantial proportion with relatively high haemoglobin concentrations (>10 g/dL), suggesting two patients subgroups. Patients in the top quintile had a unimodal distribution of haemoglobin.
Appendix 7—figure 1.
Appendix 7—figure 1.. Pattern of missing clinical data in the 930 Vietnamese adults.
These data pool the AQ Vietnam severe malaria study (Hien et al., 1996) and the AAV severe malaria study (Phu et al., 2010) (red: missing; yellow: recorded).
Appendix 7—figure 2.
Appendix 7—figure 2.. Missing clinical data in the 2220 Kenyan children diagnosed with severe malaria (red: missing; yellow: recorded).
Appendix 8—figure 1.
Appendix 8—figure 1.. Relationship between age and mean white count (modelled on the log10 scale).
This is estimated from 858 children in the FEAST trial who had white counts available using an additive linear model (p=10-8 for the smooth spline term). We used this model to adjust observed log10 white counts in all children less than 5 years of age in the reference and Kenyan datasets.
Appendix 9—figure 1.
Appendix 9—figure 1.. Normal-quantile plots for platelet counts and white blood cell counts in the reference data.
Both were standardised to have mean 0 and standard deviation of 1 on the log10 scale. The diagonal lines show the identity line.
Appendix 10—figure 1.
Appendix 10—figure 1.. Collider bias in the diagnostic model of severe malaria based on complete blood count data.
HBB in its homozygous S form (HbSS, <1% prevalence in this Kenyan population) is a rare example of how this can occur. Children with HbSS have white counts above 2–3 times higher than the normal population and slightly lower platelet counts (Sadarangani et al., 2009). Under the probabilistic model, all 11 children with HbSS were classified as having a low probability of severe malaria, based on their high white counts (mean 40,000 per μL). These probabilities cannot be taken at face value, and it remains an unanswered question whether children with HbSS are more or less susceptible than their wild-type counterparts (Williams and Obaro, 2011).
Appendix 10—figure 2.
Appendix 10—figure 2.. The relationship between HbSS and the estimated probabilities of severe malaria under the diagnostic model.
There were 11 children with HbSS and they all had low probabilities of severe malaria, but this is biased as these children have chronic inflammation with white counts 2–3 higher than the general population (Sadarangani et al., 2009) (see above Appendix 10—figure 1 for the causal diagram showing collider bias).
Appendix 11—figure 1.
Appendix 11—figure 1.. Scatter plots of platelet counts versus white blood cell counts for the Kenyan cohort, showing the 13 individuals with the double mutation HbAS and homozygous α+-thalassaemia as large black diamonds (HZ-alpha-thal).
The red-yellow-blue colour scheme is proportional to the P(Severe malaria | Data) as given by the legend in the top-left corner.
Appendix 12—figure 1.
Appendix 12—figure 1.. Simulation study demonstrating how likelihood re-weighting can improve estimation accuracy in case-control studies.
Panels (A) and (B) show histograms of the case probability weights used in the simulations for the scenarios when 50% of cases are true cases and when 100% of cases are true cases, respectively. Panel (C) shows the estimated effect sizes as a function of the proportion of mis-classified cases. Panel (D) shows the standard errors of effect estimates as a proportion of mis-classified cases.
Appendix 12—figure 2.
Appendix 12—figure 2.. Effect of case re-weighting on power (1-type 2 error).
The thick red line shows the estimated power for the re-weighted approach; the dashed black line shows the estimated power for the non-weighted approach.
Appendix 13—figure 1.
Appendix 13—figure 1.. Principal components analysis of 1666 Kenyan cases and 1606 population controls.
The colours show the main self-reported ethnicities (black: Chonyi; red: Giriama; green: Kauma; blue: other). The first five principal components were used to stratify for population structure in the GWAS analyses.

References

    1. Anstey NM, Price RN. Improving case definitions for severe malaria. PLOS Medicine. 2007;4:e267. doi: 10.1371/journal.pmed.0040267. - DOI - PMC - PubMed
    1. Band G, Le QS, Clarke GM, Kivinen K, Hubbart C, Jeffreys AE, Rowlands K, Leffler EM, Jallow M, Conway DJ, Sisay-Joof F, Sirugo G, d’Alessandro U, Toure OB, Thera MA, Konate S, Sissoko S, Mangano VD, Bougouma EC, Sirima SB, Amenga-Etego LN, Ghansah AK, Hodgson AVO, Wilson MD, Enimil A, Ansong D, Evans J, Ademola SA, Apinjoh TO, Ndila CM, Manjurano A, Drakeley C, Reyburn H, Phu NH, Quyen NTN, Thai CQ, Hien TT, Teo YY, Manning L, Laman M, Michon P, Karunajeewa H, Siba P, Allen S, Allen A, Bahlo M, Davis TME, Simpson V, Shelton J, Spencer CCA, Busby GBJ, Kerasidou A, Drury E, Stalker J, Dilthey A, Mentzer AJ, McVean G, Bojang KA, Doumbo O, Modiano D, Koram KA, Agbenyega T, Amodu OK, Achidi E, Williams TN, Marsh K, Riley EM, Molyneux M, Taylor T, Dunstan SJ, Farrar J, Mueller I, Rockett KA, Kwiatkowski DP, Network MGE. Insights into malaria susceptibility using genome-wide data on 17,000 individuals from Africa, Asia and Oceania. Nature Communications. 2019;10:5732. doi: 10.1038/s41467-019-13480-z. - DOI - PMC - PubMed
    1. Bejon P, Berkley JA, Mwangi T, Ogada E, Mwangi I, Maitland K, Williams T, Scott JA, English M, Lowe BS, Peshu N, Newton CR, Marsh K. Defining childhood severe falciparum malaria for intervention studies. PLOS Medicine. 2007;4:e251. doi: 10.1371/journal.pmed.0040251. - DOI - PMC - PubMed
    1. Carter R, Mendis KN. Evolutionary and historical aspects of the burden of malaria. Clinical Microbiology Reviews. 2002;15:564–594. doi: 10.1128/CMR.15.4.564-594.2002. - DOI - PMC - PubMed
    1. Dondorp A, Nosten F, Stepniewska K, Day N, White N. Artesunate versus quinine for treatment of severe falciparum malaria: a randomised trial. Lancet. 2005;366:717–725. doi: 10.1016/S0140-6736(05)67176-0. - DOI - PubMed

Publication types

Substances