Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Aug;22(8):502-517.
doi: 10.1038/s41576-021-00349-5. Epub 2021 Apr 8.

Advances and opportunities in malaria population genomics

Affiliations
Review

Advances and opportunities in malaria population genomics

Daniel E Neafsey et al. Nat Rev Genet. 2021 Aug.

Abstract

Almost 20 years have passed since the first reference genome assemblies were published for Plasmodium falciparum, the deadliest malaria parasite, and Anopheles gambiae, the most important mosquito vector of malaria in sub-Saharan Africa. Reference genomes now exist for all human malaria parasites and nearly half of the ~40 important vectors around the world. As a foundation for genetic diversity studies, these reference genomes have helped advance our understanding of basic disease biology and drug and insecticide resistance, and have informed vaccine development efforts. Population genomic data are increasingly being used to guide our understanding of malaria epidemiology, for example by assessing connectivity between populations and the efficacy of parasite and vector interventions. The potential value of these applications to malaria control strategies, together with the increasing diversity of genomic data types and contexts in which data are being generated, raise both opportunities and challenges in the field. This Review discusses advances in malaria genomics and explores how population genomic data could be harnessed to further support global disease control efforts.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Malaria population genomics publications 1995–2020.
Timelines of Plasmodium and Anopheles genomics publications and highlighted publications. a | Plasmodium genomics publications. An increasing fraction of Plasmodium genomics papers over time also include terms signifying population diversity in the title and/or abstract. Highlighted milestones include the publication of the first Plasmodium falciparum reference genome assembly in 2002 (ref.), co-publications reporting the characterization of genome-wide diversity in 2007 (refs) and the release of the Pf6K data set in 2019 (ref.). ‘All genomics’ data represent articles in the NCBI PubMed database published from 1 January 1995 to the present day that include the search terms “Plasmodium” and “genom*” in the title or abstract. ‘Population genomics’ data represent ‘all genomics’ articles that also include the search terms “population”, “epidemio*”, “polymorph*” or “diversity” in the title or abstract. b | Anopheles genomics publications. An increasing fraction of Anopheles genomics papers over time also include terms signifying population diversity. Highlighted milestones include the publication of the first Anopheles gambiae reference genome assembly in 2002 (ref.), the first genome-wide single nucleotide polymorphism (SNP) genotyping array for A. gambiae in 2005 (ref.) and the publication of the AG1000G project in 2017 (ref.). Search queries as above with the replacement of “Plasmodium” with “Anopheles”.
Fig. 2
Fig. 2. Genome assemblies for Plasmodium parasites and Anopheles mosquitoes.
Malaria is a disease caused by various parasite species hailing from diverse subgenera and groups of the ancient and polyphyletic Plasmodium genus (order Haemosporida). Plasmodium parasites are transmitted by different anopheline mosquito vectors, which last shared a common ancestor approximately 100 million years ago (mya). Numbers in parentheses following subgenus or group labels indicate the approximate number of formally described parasite or mosquito species in each lineage. Reference genome assemblies exist for all species except where indicated. a | A phylogenetic tree of Plasmodium parasites, with topology informed by Galen et al. and Otto et al.. The common ancestor of mammal-infecting members of Plasmodium is estimated to have existed less than 60 mya, as bats are hypothesized to be the ancestral mammalian hosts and this is the estimated date of the Chiroptera radiation. Species that infect humans are in bold. b | A phylogenetic tree of Anopheles mosquitoes with reference to available genomic resources. Topology informed by Foster et al. and Neafsey et al.. Reference genome assemblies are lacking for Kerteszia and other subgenera from the Americas, limiting comparative genomic opportunities to understand vectorial capacity and genomic investigations of insecticide resistance in many vector lineages. PNG, Papua New Guinea; ss, sensu stricto.
Fig. 3
Fig. 3. Population diversity estimates and evidence of immune-mediated balancing selection in Plasmodium falciparum vaccine targets.
Many monovalent protein subunit vaccine development efforts began before the extensive diversity of many blood-stage and liver-stage antigens was appreciated. Each dot represents a P. falciparum protein; semi-transparent grey dots indicate proteins not targeted by vaccines, opaque orange dots indicate protein targets of vaccines registered for clinical trials at ClinicalTrials.gov since 2000 (ref.). The horizontal axis indicates estimates of mean amino acid nonsynonymous pairwise diversity, π. The vertical axis depicts estimates of Tajima’s D, a neutrality statistic based on the variant site frequency spectrum, where negative values represent an excess of low-frequency variants and positive values represent an excess of high-frequency variants, which can indicate balancing selection. Estimates were computed using a collection of sequenced P. falciparum parasites from Malawi in the Pf3k resource. Compared with non-vaccine candidates, vaccine candidates generally exhibit a higher level of diversity (Wilcoxon rank sum test, W = 8322.5, P = 0.0001) and show evidence of balancing selection (Wilcoxon rank sum test, W = 8967, P = 0.0002). The newer, post-genomic blood-stage vaccine target RH5 exhibits considerably less diversity, as do several targets of transmission-blocking vaccines (P230, P25 and P48/45).
Fig. 4
Fig. 4. Complex infections, parasite mating and identity by descent.
Complex infections result from multiple mosquito bites (superinfection) and/or from a single mosquito bite (co-transmission). Recombination between genetically distinct malaria parasites can occur after a mosquito feeds on a host with a complex infection; otherwise, reproduction occurs by selfing. Genomic data may be used to observe loci that share the same allele (identical by state (IBS)) and to infer loci that are descended from a recent common ancestor (identical by descent (IBD)). a | High transmission, showing three mosquitoes, each infected with parasites from a single lineage, founding interrelated complex infections in two individuals through superinfection. Gametocytes that develop within these complex infections are imbibed by mosquitoes in which they subsequently recombine. Two illustrative scenarios are shown: one in which both selfing and outcrossing occurs, another in which only outcrossing occurs. These mosquitoes founded intra- and interrelated complex infections in two subsequent individuals, this time through co-transmission. b | Low transmission, showing the clonal propagation of two distinct but related parasite genomic sequences following selfing in the mosquito. Each circle represents one or more parasites that share the same genomic sequence. Different circles represent distinct genomic sequences; different colours represent different genetic lineages.
Fig. 5
Fig. 5. The decline of identity by descent over time in outbred and inbred populations.
An illustrative example of how recombination over successive generations breaks down chromosomal segments of identity by descent (identical by descent (IBD) segments) and decreases relatedness (probability of IBD). IBD segments, used in various applications including selection detection, require whole-genome sequencing (WGS) data to estimate, whereas relatedness values, used in applications including population connectivity characterization, do not require WGS data to estimate. The top three panels show the breakdown of IBD segments over generations when a mosquito or parasite from one population migrates into mosquitoes or parasites from another population. Vertical bars represent chromosomal genomic sequence of an individual (mosquito or parasite) sampled in the receiving population. IBD between the original migrant and the sampled individual is represented by coloured fill. Light grey represents sequence that is not IBD. Generation zero represents a comparison between the original migrant and itself. Thereafter, we show IBD between the migrant and its closest relative. Segments of IBD break down over successive generations; breakdown depends on the opportunity for outcrossing, which is limited in inbred populations, especially parasite populations, because parasites intermittently self. For each of the situations described in the top three panels, the bottom panel shows the corresponding decrease in relatedness.
Fig. 6
Fig. 6. Measuring population connectivity using relatedness versus allele frequency variation.
An illustrative example of how relatedness, a measure of the probability of identitical by descent (IBD), may reveal connectivity between populations of mosquitoes on a timescale more relevant for epidemiological applications than allele frequency variation, using genetic differentiation, the fixation index (FST), as an example measure of allele frequency variation. Rows represent distinct mosquito populations. Mosquitoes represent individuals sampled, coloured by their dominant lineage, and edge connections indicate high relatedness and therefore high probability of IBD between pairs of mosquitoes across populations. Populations with a higher degree of connectivity are liable to share more highly related edges. Variability in relatedness between populations may occur with or without appreciable inter-population variation in the allele frequency of genotyped markers. Allele frequencies are indicated by histograms, with vertical axes depicting allele frequencies and horizontal axes depicting markers. FST describes variation in allele frequency between populations, with poorly connected populations expected to exhibit more vicariant allele frequencies. Genetic drift, migration and/or selection generally take much more time to shift minor allele frequencies among isolated populations relative to recombination, which is expected to rapidly deplete relatedness providing that the rate of outcrossing is sufficiently high.

References

    1. Baton LA, Ranford-Cartwright LC. Spreading the seeds of million-murdering death: metamorphoses of malaria in the mosquito. Trends Parasitol. 2005;21:573–580. doi: 10.1016/j.pt.2005.09.012. - DOI - PubMed
    1. Loy DE, et al. Out of Africa: origins and evolution of the human malaria parasites Plasmodium falciparum and Plasmodium vivax. Int. J. Parasitol. 2017;47:87–97. doi: 10.1016/j.ijpara.2016.05.008. - DOI - PMC - PubMed
    1. Otto TD, et al. Genomes of all known members of a Plasmodium subgenus reveal paths to virulent human malaria. Nat. Microbiol. 2018;3:687–697. doi: 10.1038/s41564-018-0162-2. - DOI - PMC - PubMed
    1. Kwiatkowski DP. How malaria has affected the human genome and what human genetics can teach us about malaria. Am. J. Hum. Genet. 2005;77:171–192. doi: 10.1086/432519. - DOI - PMC - PubMed
    1. World Health Organization. World Malaria Report 2020 (WHO, 2020).