Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Feb 15;15(1):1413.
doi: 10.1038/s41467-024-45688-z.

Flexible and cost-effective genomic surveillance of P. falciparum malaria with targeted nanopore sequencing

Affiliations

Flexible and cost-effective genomic surveillance of P. falciparum malaria with targeted nanopore sequencing

Mariateresa de Cesare et al. Nat Commun. .

Abstract

Genomic surveillance of Plasmodium falciparum malaria can provide policy-relevant information about antimalarial drug resistance, diagnostic test failure, and the evolution of vaccine targets. Yet the large and low complexity genome of P. falciparum complicates the development of genomic methods, while resource constraints in malaria endemic regions can limit their deployment. Here, we demonstrate an approach for targeted nanopore sequencing of P. falciparum from dried blood spots (DBS) that enables cost-effective genomic surveillance of malaria in low-resource settings. We release software that facilitates flexible design of amplicon sequencing panels and use this software to design two target panels for P. falciparum. The panels generate 3-4 kbp reads for eight and sixteen targets respectively, covering key drug-resistance associated genes, diagnostic test antigens, polymorphic markers and the vaccine target csp. We validate our approach on mock and field samples, demonstrating robust sequencing coverage, accurate variant calls within coding sequences, the ability to explore P. falciparum within-sample diversity and to detect deletions underlying rapid diagnostic test failure.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Design of long-range multiplex PCRs for the low-complexity P. falciparum genome using multiply.
a Multiplex PCR primer design workflow by multiply. An optimal set of primers is selected from a large candidate pool; minimising SNPs in primer binding sites, primer dimers, and off-target primer binding with a cost function. b Schematic of a cost-effective protocol for targeted nanopore sequencing of P. falciparum malaria from dried blood spots (DBS) that takes three days and costs ~ USD $25 per sample. c Histograms of crt and (d) kelch13 coverage stratified by read length. (A+T) percentage in 20bp sliding widows (blue) and homopolymer run length (red) are shown, as well as a heatmap of nucleotide composition. For both genes the entire coding sequence (CDS) is covered in the majority of reads. e Read length distributions for NOMADS8 (left, 28.8 kbp total) and NOMADS16 (right, 54.7 kbp total) amplicon panels. Grey triangle indicates coding sequence (CDS) length. Amplicons were designed to be 3–4 kbp. Marginal distribution for all amplicons displayed at top. Data for (ce) are from a mock sample created from P. falciparum 3D7 and human DNA (Methods).
Fig. 2
Fig. 2. Sequencing throughput and coverage across samples and target genes for the NOMADS8 panel.
a Diagram of reads produced on a Flongle Flow Cell (FLO-FLG001) sequencing 24 mock samples comprised of P. falciparum and human DNA. The leftmost bar represents all reads (n = 345, 457; 100%) generated during the sequencing run that are sequentially subdivided in the data analysis process to the reads of interest, i.e., those mapped to target genes (n = 198, 030; 57.4%). b Bar plot (left pane) displays the total number of reads generated for each sample stratified by mapping outcome: mapped to P. falciparum (P.f.) (blue), human (H.s.) (red), or failing to map (grey, too few to be visible). P.f. mapping percentages indicated with text. Scatter plot (right pane) displays the number of reads overlapping each target gene (labelled by colours) after mapping for each sample. Note number of reads (x-axis) is displayed in log-scale. For most samples, all target genes have > 100x coverage. Number at right (e.g., 8x for 3D7) gives the fold-difference between the highest coverage and lowest coverage target. c Same as (b) but for 28 field samples collected as DBS from Kaoma, Zambia.
Fig. 3
Fig. 3. Effect of parasitemia on sequencing performance measures.
Scatter plots display the effect that parasitemia (x-axis) has on the NOMADS8 (left), NOMADS16 (middle) and NOMADS16 amplicon panel with hrp genes ignored (right). Three measures of sequencing performance are shown (y-axis): “Normalised Sample Throughput", which is the number of reads generated for a sample, divided by the mean number of reads per sample for the sequencing run (top row); “P.f. Mapping Percentage", which is the percentage of all reads from a sample that mapped to P. falciparum (middle row); and the “Amplicon Coverage Fold-difference" which, for a given sample, is the ratio of the number of reads overlapping the highest abundance amplicon, divided by the number of reads overlapping the lowest abundance amplicon (bottom row). Each point is either an mock sample (grey), or a field sample sequenced in Oxford (green) or Zambia (orange). Samples sequenced on a R9.4.1 Flongle Flow Cell (FLG001) are indicated with triangles; R9.4.1 MinION Flow Cell (MIN106D), with circles; R10.4.1 MinION Flow Cell (MIN114D), with squares. Median values are shown as horizontal lines and Pearson correlation coefficient is given in top left. Note that parasitemia data is missing for 16 field samples.
Fig. 4
Fig. 4. SNP calling accuracy for a set of clonal mock samples.
a Genotyping results from Clair3 for seven clonal mock samples and across 41 antimalarial resistance-associated mutations. Samples were sequenced with a R10.4.1 Flow Cell on a MinION Mk1b device. b Mean F1-Score (harmonic mean of precision and recall) of SNP calling compared to PacBio data from for Dd2 and HB3 mock samples randomly downsampled to different read depths. Each square gives the mean F1-Score across twenty in silico replicates (ten replicates for each of Dd2 and HB3) at the indicated read depth (columns) and across the indicated region (rows). In total, there are 200 in silico replicates across all depths. Top panel is limited to coding sequence ("CDS") and bottom panel the entire span of the amplicons ("Amplicon"). c Visualisation of true positive and false positive rate of sites spanning the crt amplicon in chromosome 7. From top to bottom, panels show an exon diagram of crt; the true positive rate (green) and false positive rate (red) of each site across twenty replicates at a given read depth (indicated by circle size); A+T% in 20bp sliding windows (blue shade) and homopolymer length (red line); and heatmap of nucleotide composition. d Same as (c) but for dhps amplicon. e Heatmaps showing measures of sequence complexity in 20 bp windows surrounding sites where errors were observed. Rows indicate A+T% of the 20 bp window, columns indicate length of the longest hompolymer within the 20 bp window and colour gives number of errors. Top panel shows errors which were corrected with additional read depth (i.e., exist at depth < 100); bottom panel shows errors that persist at a depth of 100 reads. Selected sequences are shown; asterisk (*) marks sequences that are an example from a bin with greater than one sequence.
Fig. 5
Fig. 5. Analysis of length polymorphism and nucleotide identity of msp2-derived reads.
a Read length distributions of msp2 alleles across 24 mock samples. Each dot represents the length of a single read that was trimmed to the extent of msp2 coding-sequence (CDS) after mapping. Individual reads are coloured by the laboratory strain to which they have the highest identity alignment. Multi-modal distributions are indicative of mixed infections. b Hierarchically clustered heatmaps of msp2-derived reads showing pairwise alignment scores. Each cell is coloured by the global pairwise alignment score between two msp2-derived reads, which have been hierarchically clustered along both rows and columns. Colours of rows and columns indicate the laboratory strain to which each read has the highest identity alignment, as in (a). Heatmaps are shown for two different mock samples: clonal 3D7 (top); mixture of 3D7 and Dd2 (middle); and mixture of 3D7, Dd2 and GB4 (bottom). Note how reads cluster based on allele type. GB4 reads are under-represented in the bottom heatmap, likely due to lower DNA quality. c, d are the same as (a, b), but for 28 field samples from Zambia. In (c), read lengths distributions suggest the presence of both clonal and mixed infections. In (d), examples of likely clonal infection (top); two-strain infection (middle); and three-strain infection (bottom).
Fig. 6
Fig. 6. Validation of hrp2/3 deletion detection using the NOMADS16 panel.
a Diagram showing the location of the hrp2, hrp2 upstream and hrp2 downstream amplicons in the NOMADS16 panel, within a 50 kbp window of chromosome 8. The chromosome is represented by a dark grey horizontal line, on which thicker segments demarcate genes (labelled above) and their exons. The genomic extent of documented hrp2 deletions is displayed above the chromosome for lab strains Dd2 and HB3, and for a selection of three field strains,. Amplicon positions are shown below in orange. b Same as in (a) but for hrp3 upstream, hrp3 and hrp3 downstream amplicons, shown in purple. Note Dd2 does not have a deletion within this windnow. c Heatmap displaying the normalised abundance of NOMADS16 panel amplicons (rows) across 48 mock samples (columns). The P. f. strain used in the mock sample (3D7, blue; Dd2, green; HB3, red; P. f. -negative, grey) and its parasitemia is indicated above the heatmap. The bottom six rows of the heatmap show amplicons designed for detection of hrp2 and hrp3 deletions. d Scatterplot showing the relationship between amplicon abundance (y-axis, in number of reads) and parasitemia (x-axis) for the hrp2 amplicon across all 48 mock samples. As in (c) the colour of points indicates the P. f. strain used in the mock sample. e Same as (d) but for the hrp3 amplicon. f Heatmap displaying the probability of deletion for each of the six amplicons designed to support hrp2/3 deletion detection (rows) across the 48 mock samples (columns) indicated in (c). Probabilities were calculated using a statistical model (Methods) that leverages all sixteen amplicons in NOMADS16 and estimates barcode misclassification/contamination rates from P. f. -negative samples. The expected deletions are detected with a very high degree of certainty (black squares). Uncertainty about P. f. -negative samples (deletion probability between 0.2 and 0.8) is expected as they have very few reads.

References

    1. White NJ. Antimalarial drug resistance. J. Clin. Investig. 2004;113:1084–1092. doi: 10.1172/JCI21682. - DOI - PMC - PubMed
    1. Haldar K, Bhattacharjee S, Safeukui I. Drug resistance in plasmodium. Nat. Rev. Microbiol. 2018;16:156–170. doi: 10.1038/nrmicro.2017.161. - DOI - PMC - PubMed
    1. World Health Organisation. WHO World Malaria Report 2022 (WHO, 2022).
    1. MalariaGEN Plasmodium falciparum Community Project. Genomic epidemiology of artemisinin resistant malaria. Elife. 2016;5:e08714. doi: 10.7554/eLife.08714. - DOI - PMC - PubMed
    1. Imwong M, et al. The spread of artemisinin-resistant plasmodium falciparum in the greater mekong subregion: a molecular epidemiology observational study. Lancet Infect. Dis. 2017;17:491–497. doi: 10.1016/S1473-3099(17)30048-8. - DOI - PMC - PubMed