Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jul 13:6:42.
doi: 10.12688/wellcomeopenres.16168.2. eCollection 2021.

An open dataset of Plasmodium falciparum genome variation in 7,000 worldwide samples

MalariaGENAmbroise Ahouidi  1 Mozam Ali  2 Jacob Almagro-Garcia  2   3 Alfred Amambua-Ngwa  2   4 Chanaki Amaratunga  5 Roberto Amato  2   3 Lucas Amenga-Etego  6   7 Ben Andagalu  8 Tim J C Anderson  9 Voahangy Andrianaranjaka  10 Tobias Apinjoh  11 Cristina Ariani  2 Elizabeth A Ashley  12 Sarah Auburn  13   14 Gordon A Awandare  7   15 Hampate Ba  16 Vito Baraka  17   18 Alyssa E Barry  19   20   21 Philip Bejon  22 Gwladys I Bertin  23 Maciej F Boni  14   24 Steffen Borrmann  25 Teun Bousema  26   27 Oralee Branch  28 Peter C Bull  22   29 George B J Busby  3 Thanat Chookajorn  30 Kesinee Chotivanich  30 Antoine Claessens  4   31 David Conway  26 Alister Craig  32   33 Umberto D'Alessandro  4 Souleymane Dama  34 Nicholas Pj Day  12 Brigitte Denis  33 Mahamadou Diakite  34 Abdoulaye Djimdé  34 Christiane Dolecek  14 Arjen M Dondorp  12 Chris Drakeley  26 Eleanor Drury  2 Patrick Duffy  5 Diego F Echeverry  35   36 Thomas G Egwang  37 Berhanu Erko  38 Rick M Fairhurst  39 Abdul Faiz  40 Caterina A Fanello  12 Mark M Fukuda  41 Dionicia Gamboa  42 Anita Ghansah  43 Lemu Golassa  38 Sonia Goncalves  2 William L Hamilton  2   44 G L Abby Harrison  21 Lee Hart  3 Christa Henrichs  3 Tran Tinh Hien  24   45 Catherine A Hill  46 Abraham Hodgson  47 Christina Hubbart  48 Mallika Imwong  30 Deus S Ishengoma  17   49 Scott A Jackson  50 Chris G Jacob  2 Ben Jeffery  3 Anna E Jeffreys  48 Kimberly J Johnson  3 Dushyanth Jyothi  2 Claire Kamaliddin  23 Edwin Kamau  51 Mihir Kekre  2 Krzysztof Kluczynski  3 Theerarat Kochakarn  2   30 Abibatou Konaté  52 Dominic P Kwiatkowski  2   3   48 Myat Phone Kyaw  53   54 Pharath Lim  5   55 Chanthap Lon  41 Kovana M Loua  56 Oumou Maïga-Ascofaré  34   57   58 Cinzia Malangone  2 Magnus Manske  2 Jutta Marfurt  13 Kevin Marsh  14   59 Mayfong Mayxay  60   61 Alistair Miles  2   3 Olivo Miotto  2   3   12 Victor Mobegi  62 Olugbenga A Mokuolu  63 Jacqui Montgomery  64 Ivo Mueller  21   65 Paul N Newton  66 Thuy Nguyen  2 Thuy-Nhien Nguyen  24 Harald Noedl  67 Francois Nosten  14   68 Rintis Noviyanti  69 Alexis Nzila  70 Lynette I Ochola-Oyier  22 Harold Ocholla  71   72 Abraham Oduro  6 Irene Omedo  22 Marie A Onyamboko  73 Jean-Bosco Ouedraogo  74 Kolapo Oyebola  75   76 Richard D Pearson  2   3 Norbert Peshu  22 Aung Pyae Phyo  12   68 Chris V Plowe  77 Ric N Price  12   13   45 Sasithon Pukrittayakamee  30 Milijaona Randrianarivelojosia  78   79 Julian C Rayner  2 Pascal Ringwald  80 Kirk A Rockett  2   48 Katherine Rowlands  48 Lastenia Ruiz  81 David Saunders  41 Alex Shayo  82 Peter Siba  83 Victoria J Simpson  3 Jim Stalker  2 Xin-Zhuan Su  5 Colin Sutherland  26 Shannon Takala-Harrison  84 Livingstone Tavul  83 Vandana Thathy  22   85 Antoinette Tshefu  86 Federica Verra  87 Joseph Vinetz  42   88 Thomas E Wellems  5 Jason Wendler  48 Nicholas J White  12 Ian Wright  3 William Yavo  52   89 Htut Ye  90
Affiliations

An open dataset of Plasmodium falciparum genome variation in 7,000 worldwide samples

MalariaGEN et al. Wellcome Open Res. .

Abstract

MalariaGEN is a data-sharing network that enables groups around the world to work together on the genomic epidemiology of malaria. Here we describe a new release of curated genome variation data on 7,000 Plasmodium falciparum samples from MalariaGEN partner studies in 28 malaria-endemic countries. High-quality genotype calls on 3 million single nucleotide polymorphisms (SNPs) and short indels were produced using a standardised analysis pipeline. Copy number variants associated with drug resistance and structural variants that cause failure of rapid diagnostic tests were also analysed. Almost all samples showed genetic evidence of resistance to at least one antimalarial drug, and some samples from Southeast Asia carried markers of resistance to six commonly-used drugs. Genes expressed during the mosquito stage of the parasite life-cycle are prominent among loci that show strong geographic differentiation. By continuing to enlarge this open data resource we aim to facilitate research into the evolutionary processes affecting malaria control and to accelerate development of the surveillance toolkit required for malaria elimination.

Keywords: data resource; drug resistance; evolution; genomic epidemiology; genomics; malaria; plasmodium falciparum; population genetics; rapid diagnostic test failure.

PubMed Disclaimer

Conflict of interest statement

No competing interests were disclosed.

Figures

Figure 1.
Figure 1.. Population structure.
( A) Genome-wide unrooted neighbour-joining tree showing population structure across all sites, with sample branches coloured according to country groupings ( Table 1): South America (green, n=37); West Africa (red, n=2231); Central Africa (orange, n=344); East Africa (yellow, n=739); South Asia (purple, n=77); West Southeast Asia (light blue; n=1079); East Southeast Asia (dark blue; n=1262); Oceania (magenta; n=201). The circular inset shows a magnified view of the part of the tree where the majority of samples from Africa coalesce, showing that the three African sub-regions are genetically close but distinct. ( B, C) First three component of a genome-wide principal coordinate analysis. The first axis (PC1) captures the separation of African and South American from Asian samples. The following two axes (PC2 and PC3) capture finer levels of population structure due to geographical separation and selective forces. Each point represents a sample and the colour legend is the same as above.
Figure 2.
Figure 2.. Characteristics of the eight regional parasite populations.
( A) Distribution of within-host diversity, as measured by F WS, showing that genetically mixed infections were considerably more common in Africa than other regions, consistent with the high intensity of malaria transmission in Africa. ( B) Distribution of per site nucleotide diversity calculated in non-overlapping 25kbp genomic windows. We only considered coding biallelic SNPs to reduce the ascertainment bias caused by poor accessibility of non-coding regions. In both previous panels, thick lines represent median values, boxes show the interquartile range, and whiskers represent the bulk of the distribution, discounting outliers. ( C) Genome-wide median LD (y-axis, measured by r 2) between pairs of SNPs as function of their physical distance (x-axis, in bp), showing a rapid decay in all regional parasite populations. The inset panel shows a magnified view of the decay, showing that in all populations r 2 decayed below 0.1 (dashed horizontal line) within 500 bp. All panels utilise the same palette, with colours denoting each geographic region.
Figure 3.
Figure 3.. Geographic patterns of population differentiation and gene flow.
Each point represents one pairwise comparison between two regional parasite populations. The x-axis reports the geographic separation between the two populations, measured as great-circle distance between the centre of mass of each population and without taking into account natural barriers. The y-axis reports the genetic differentiation between the two populations, measured as average genome-wide F ST. Points are coloured based on the regional populations they represent: between African populations (red); between Asian populations (blue); between Southeast Asia (as a whole) and Oceania, Africa or South America (purple); all the rest (orange).
Figure 4.
Figure 4.. SNPs geographic differentiation.
Coloured lines show the proportions of SNPs in ten F ST bins, stratified by genomic regions: non-synonymous (red), synonymous (yellow), intronic (green) and intergenic (blue). F ST is calculated between all eight regional parasite populations and the number of SNPs in each bin is indicated in the background histogram. The y-axis on the right-hand side refers to the histogram and is on a log scale.

References

    1. Malaria Genomic Epidemiology Network: A global network for investigating the genomic epidemiology of malaria. Nature. 2008;456(7223):732–7. 10.1038/nature07632 - DOI - PMC - PubMed
    1. Chokshi DA, Parker M, Kwiatkowski DP: Data sharing and intellectual property in a genomic epidemiology network: policies for large-scale research collaboration. Bull World Health Organ. 2006;84(5):382–7. 10.2471/blt.06.029843 - DOI - PMC - PubMed
    1. Parker M, Bull SJ, de Vries J, et al. : Ethical data release in genome-wide association studies in developing countries. PLoS Med. 2009;6(11): e1000143. 10.1371/journal.pmed.1000143 - DOI - PMC - PubMed
    1. Ghansah A, Amenga-Etego L, Amambua-Ngwa A, et al. : Monitoring parasite diversity for malaria elimination in sub-Saharan Africa. Science. 2014;345(6202):1297–8. 10.1126/science.1259423 - DOI - PMC - PubMed
    1. Auburn S, Campino S, Clark TG, et al. : An effective method to purify Plasmodium falciparum DNA directly from clinical blood samples for whole genome high-throughput sequencing. PLoS One. 2011;6(7): e22213. 10.1371/journal.pone.0022213 - DOI - PMC - PubMed

LinkOut - more resources