Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan 16:8:22.
doi: 10.12688/wellcomeopenres.18681.1. eCollection 2023.

Pf7: an open dataset of Plasmodium falciparum genome variation in 20,000 worldwide samples

MalariaGENMuzamil Mahdi Abdel Hamid  1 Mohamed Hassan Abdelraheem  1   2 Desmond Omane Acheampong  3 Ambroise Ahouidi  4 Mozam Ali  5 Jacob Almagro-Garcia  5 Alfred Amambua-Ngwa  5   6 Chanaki Amaratunga  7 Lucas Amenga-Etego  8   9 Ben Andagalu  10 Tim Anderson  11 Voahangy Andrianaranjaka  12 Ifeyinwa Aniebo  13 Enoch Aninagyei  14 Felix Ansah  8 Patrick O Ansah  9 Tobias Apinjoh  15 Paulo Arnaldo  16 Elizabeth Ashley  17   18 Sarah Auburn  19   20 Gordon A Awandare  8 Hampate Ba  21 Vito Baraka  22   23 Alyssa Barry  24   25   26 Philip Bejon  27 Gwladys I Bertin  28 Maciej F Boni  20   29 Steffen Borrmann  30 Teun Bousema  31   32 Marielle Bouyou-Akotet  33 Oralee Branch  34 Peter C Bull  27   35 Huch Cheah  36 Keobouphaphone Chindavongsa  37 Thanat Chookajorn  38 Kesinee Chotivanich  38 Antoine Claessens  6   39 David J Conway  31 Vladimir Corredor  40 Erin Courtier  5 Alister Craig  41   42 Umberto D'Alessandro  6 Souleymane Dama  43 Nicholas Day  17   18 Brigitte Denis  42 Mehul Dhorda  17   44 Mahamadou Diakite  43   45 Abdoulaye Djimde  43 Christiane Dolecek  20 Arjen Dondorp  17   18 Seydou Doumbia  43   45 Chris Drakeley  31 Eleanor Drury  5 Patrick Duffy  7 Diego F Echeverry  46   47 Thomas G Egwang  48 Sonia Maria Mauricio Enosse  16 Berhanu Erko  49 Rick M Fairhurst  50 Abdul Faiz  51 Caterina A Fanello  17 Mark Fleharty  52 Matthew Forbes  5 Mark Fukuda  53 Dionicia Gamboa  54 Anita Ghansah  55 Lemu Golassa  49 Sonia Goncalves  5 G L Abby Harrison  24 Sara Anne Healy  7 Jason A Hendry  56 Anastasia Hernandez-Koutoucheva  5 Tran Tinh Hien  18   29 Catherine A Hill  57 Francis Hombhanje  58 Amanda Hott  59 Ye Htut  60 Mazza Hussein  1 Mallika Imwong  38 Deus Ishengoma  22   61 Scott A Jackson  62 Chris G Jacob  5 Julia Jeans  5 Kimberly J Johnson  5 Claire Kamaliddin  28   63 Edwin Kamau  64 Jon Keatley  5 Theerarat Kochakarn  38 Drissa S Konate  43 Abibatou Konaté  65 Aminatou Kone  43 Dominic P Kwiatkowski  5 Myat P Kyaw  66   67 Dennis Kyle  59   68 Mara Lawniczak  5 Samuel K Lee  52 Martha Lemnge  22 Pharath Lim  7   69 Chanthap Lon  70 Kovana M Loua  71   72 Celine I Mandara  22 Jutta Marfurt  19 Kevin Marsh  20   27 Richard James Maude  17   18   73 Mayfong Mayxay  18   74   75 Oumou Maïga-Ascofaré  43   76   77 Olivo Miotto  5   17   78 Toshihiro Mita  79 Victor Mobegi  80 Abdelrahim Osman Mohamed  81 Olugbenga A Mokuolu  82 Jaqui Montgomery  42   83 Collins Misita Morang'a  8 Ivo Mueller  24   84 Kathryn Murie  5 Paul N Newton  18   74 Thang Ngo Duc  85 Thuy Nguyen  5 Thuy-Nhien Nguyen  18   29 Tuyen Nguyen Thi Kim  29 Hong Nguyen Van  85 Harald Noedl  86   87 Francois Nosten  18   88 Rintis Noviyanti  89 Vincent Ntui-Njock Ntui  15 Alexis Nzila  90 Lynette Isabella Ochola-Oyier  27 Harold Ocholla  91   92 Abraham Oduro  9 Irene Omedo  5   27 Marie A Onyamboko  93 Jean-Bosco Ouedraogo  94 Kolapo Oyebola  95   96 Wellington Aghoghovwia Oyibo  97 Richard Pearson  5 Norbert Peshu  27 Aung P Phyo  17   98 Christopher V Plowe  99 Ric N Price  17   18   19 Sasithon Pukrittayakamee  38 Huynh Hong Quang  100 Milijaona Randrianarivelojosia  101   102 Julian C Rayner  103 Pascal Ringwald  104 Anna Rosanas-Urgell  105 Eduard Rovira-Vallbona  105 Valentin Ruano-Rubio  52 Lastenia Ruiz  106 David Saunders  107 Alex Shayo  108 Peter Siba  109 Victoria J Simpson  5 Mahamadou S Sissoko  43 Christen Smith  5 Xin-Zhuan Su  7 Colin Sutherland  31 Shannon Takala-Harrison  110 Arthur Talman  111 Livingstone Tavul  109 Ngo Viet Thanh  29 Vandana Thathy  27   112 Aung Myint Thu  88 Mahamoudou Toure  43 Antoinette Tshefu  113 Federica Verra  114 Joseph Vinetz  54   115 Thomas E Wellems  7 Jason Wendler  7   116 Nicholas J White  17   18 Georgia Whitton  5 William Yavo  65   117 Rob W van der Pluijm  17
Affiliations

Pf7: an open dataset of Plasmodium falciparum genome variation in 20,000 worldwide samples

MalariaGEN et al. Wellcome Open Res. .

Abstract

We describe the MalariaGEN Pf7 data resource, the seventh release of Plasmodium falciparum genome variation data from the MalariaGEN network. It comprises over 20,000 samples from 82 partner studies in 33 countries, including several malaria endemic regions that were previously underrepresented. For the first time we include dried blood spot samples that were sequenced after selective whole genome amplification, necessitating new methods to genotype copy number variations. We identify a large number of newly emerging crt mutations in parts of Southeast Asia, and show examples of heterogeneities in patterns of drug resistance within Africa and within the Indian subcontinent. We describe the profile of variations in the C-terminal of the csp gene and relate this to the sequence used in the RTS,S and R21 malaria vaccines. Pf7 provides high-quality data on genotype calls for 6 million SNPs and short indels, analysis of large deletions that cause failure of rapid diagnostic tests, and systematic characterisation of six major drug resistance loci, all of which can be freely downloaded from the MalariaGEN website.

Keywords: data resource; genomic epidemiology; genomics; malaria; plasmodium falciparum.

PubMed Disclaimer

Conflict of interest statement

No competing interests were disclosed.

Figures

Figure 1.
Figure 1.. Geographic distribution of sampling locations and population structure.
( A) Map shows the centres of the 97 first-level administrative divisions from where samples were collected. Points are coloured according to the major sub-population to which the location is assigned ( Table 1). ( B) First two components of a genome-wide principal coordinate analysis. The first axis (PC1, 17.6% of variance explained) captures the separation of African and South American from Asian and Oceanian samples. The second axis (PC2, 2.4% of variance explained) captures finer levels of population structure particularly in the eastern SE Asia population. Each point represents a QC pass sample and the colour legend is the same as in ( A). ( C) First and fifth (0.7% of variance explained) components of a genome-wide principal coordinate analysis. Here there is an approximate mapping between the principal components and the geographic location (latitude and longitude).
Figure 2.
Figure 2.. Heterogenity of chloroquine resistance in west Africa.
Inferred resistance levels to chloroquine between 2013 and 2016 in different administrative divisions within West Africa. We only include locations for which we have at least 25 samples with an unambiguous inferred chloroquine resistance phenotype. Note the very different chloroquine resistance profiles in nearby locations, e.g. Volta, Ghana vs Atlantique, Benin.
Figure 3.
Figure 3.. Newly emerging Dd2 background mutations in crt.
(a) Top panel - frequency of different haplotypes with a genetic background identical to the lab strain Dd2. Dd2 is derived from an isolate taken from a patient in Indochina in 1980. Middle panel - breakdown of samples by major sub-population for each haplotype. Lower panel - amino acid mutations in the haplotypes (with respect to 3D7 reference). Mutations found in the Dd2 haplotype are shown in grey, all other mutations are shown in black. (b) Bar plots showing changing frequency of newly emerging Dd2 background crt haplotypes in different locations in the eastern SE region. Newly emerging Dd2 background haplotypes are defined as all haplotypes that have all mutations seen in Dd2 plus additional mutations.
Figure 4.
Figure 4.. Analysis of c-terminal of csp.
(a) Upper panel - frequency of different haplotypes of c-terminal of csp. Haplotypes found in some lab strains are named and highlighted in red. Haplotypes are ordered as per lower panel. Lower panel - global mean distance (number of amino acid differences) to all other haplotypes. (b) Histograms of number of amino acid differences between samples in each major sub-population and the 3D7 haplotype (upper plot) and Dd2 haplotype (lower plot).
Figure 5.
Figure 5.. HRP deletion breakpoints.
We see five different breakpoints resulting in the deletion of hrp2. Four of these are within exon 2 of the gene whereas the fifth is found between hrp2 and the pseudogene PF3D7_0831750. For all five events we see evidence of telomeric healing from reads that contain part Pf3D7_08_v3 sequence and part telomeric repeat sequence (GGGTTCA/GGGTTTA). We see 16 different breakpoints resulting in the deletion of hrp3. For fourteen of these we see evidence of telomeric healing. Note that many of these events result in the deletion of other genes in addition to hrp3. For twenty samples from Cambodia and a single sample from Vietnam we see evidence of a recombination with chromosome 5 which results in a hybrid chromosome comprising mostly chromosome 13 sequence but a small inverted section of an internal portion of chromosome 5 containing the gene mdr1. We also see evidence of a recombination with chromosome 11 which results in a hybrid chromosome comprising mostly chromosome 13 sequence but also a section of the 3’ end of chromosome 11. This is the most common deletion type, being seen in 151 samples from 14 different countries. Because the recombination occurs between highly similar sequences of a set of three orthologous ribosomal RNA genes found on both chromosomes, it is not possible to identify the exact breakpoints.

References

    1. World malaria report 2021. Reference Source
    1. Neafsey DE, Taylor AR, MacInnis BL: Advances and opportunities in malaria population genomics. Nat Rev Genet. 2021;22(8):502–517. 10.1038/s41576-021-00349-5 - DOI - PMC - PubMed
    1. Malaria Genomic Epidemiology Network: A global network for investigating the genomic epidemiology of malaria. Nature. 2008;456(7223):732–737. 10.1038/nature07632 - DOI - PMC - PubMed
    1. https://www.malariagen.net/parasite/pf3k
    1. MalariaGEN: Ahouidi A Ali M et al. : An open dataset of Plasmodium falciparum genome variation in 7,000 worldwide samples [version 2; peer review: 2 approved]. Wellcome Open Res. 2021;6:42. 10.12688/wellcomeopenres.16168.2 - DOI - PMC - PubMed