Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Sep;26(9):1288-99.
doi: 10.1101/gr.203711.115. Epub 2016 Aug 16.

Indels, structural variation, and recombination drive genomic diversity in Plasmodium falciparum

Affiliations

Indels, structural variation, and recombination drive genomic diversity in Plasmodium falciparum

Alistair Miles et al. Genome Res. 2016 Sep.

Abstract

The malaria parasite Plasmodium falciparum has a great capacity for evolutionary adaptation to evade host immunity and develop drug resistance. Current understanding of parasite evolution is impeded by the fact that a large fraction of the genome is either highly repetitive or highly variable and thus difficult to analyze using short-read sequencing technologies. Here, we describe a resource of deep sequencing data on parents and progeny from genetic crosses, which has enabled us to perform the first genome-wide, integrated analysis of SNP, indel and complex polymorphisms, using Mendelian error rates as an indicator of genotypic accuracy. These data reveal that indels are exceptionally abundant, being more common than SNPs and thus the dominant mode of polymorphism within the core genome. We use the high density of SNP and indel markers to analyze patterns of meiotic recombination, confirming a high rate of crossover events and providing the first estimates for the rate of non-crossover events and the length of conversion tracts. We observe several instances of meiotic recombination within copy number variants associated with drug resistance, demonstrating a mechanism whereby fitness costs associated with resistance mutations could be compensated and greater phenotypic plasticity could be acquired.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Properties of indels. (A) Indel size distribution (size > 0 are insertions, size < 0 are deletions). Solid black bars represent the frequency of indels that are expansions or contractions of short tandem repeats (STR); solid white bars represent the frequency of non-STR indels. Most coding indels are size multiples of 3, preserving the reading frame. Most noncoding indels are size multiples of 2, reflecting the abundance of poly(AT) repeats in noncoding regions. (B) Amino acids inserted and deleted (relative to the 3D7 reference genome). (C) Indel diversity in intergenic regions relative to the position of core promoters predicted by Brick et al. (2008). Each point represents the mean indel diversity in a 50-bp window at a given distance from the center of a core promoter. Vertical bars represent the 95% confidence interval from 1000 bootstraps. The dashed line is at the mean intergenic diversity for the given indel class (STR/non-STR).
Figure 2.
Figure 2.
Variation in nucleotide diversity over the core genome. Nucleotide diversity is shown for each cross in 500-bp half-overlapping windows across the core genome (which excludes hypervariable regions containing var, rif, or stevor genes) using SNPs combined from both variant calling methods and passing all quality filters. The peak of nucleotide diversity on Chromosome 10 is expanded to show four distinct peaks due to genes encoding merozoite surface antigens MSP3, MSP6, DBLMSP, and DBLMSP2. All labeled loci (with the exception of AMA1) are sites of complex variation where assembly of sequence reads is required to determine the nonreference alleles.
Figure 3.
Figure 3.
Crossover (CO) and non-crossover (NCO) recombination parameters. (A) Genetic map length by cross. For each cross, the red line shows the median map length averaged over progeny; boxes extend from lower to upper quartiles. (B) Map length by chromosome. Each point shows the mean map length for a single chromosome averaged over progeny, with an error bar showing the 95% confidence interval from 1000 bootstraps. The line shows a fitted linear regression model with shading showing the 95% bootstrap confidence interval. (C) CO recombination rate relative to centromere position as given by the genome annotation. Error bars show the 95% confidence interval from 1000 bootstraps. (D) NCO tract length distribution. The dashed line shows the distribution of minimal tract lengths that would be observed with the available markers if NCO tract lengths follow a geometric distribution with parameter φ= 0.9993. (E) Quantile-quantile plot of actual NCO minimal tract lengths versus the expected distribution of minimal tract lengths that would be observed with the given markers if NCO tract length is modeled as a geometric distribution with parameter φ = 0.9993. The data fit the model well except for an excess of tracts with minimal length greater than ∼3 kb. (F) NCO frequency by chromosome, adjusted for incomplete discovery of NCO events. Error bars and linear regression as in B.
Figure 4.
Figure 4.
Copy number variation and recombination spanning the anti-folate resistance gene gch1 on Chromosome 12. (A) CNVs in the 3D7 and HB3(1) parental clones; α labels the segment amplified in HB3, β labels the segment amplified in 3D7. (B) CNV and recombination in clone C06, progeny of 3D7 × HB3. AB = fraction of aligned reads containing the first parent's allele. (C) CNV and recombination in clone C05, progeny of 3D7 × HB3. AB = fraction of aligned reads containing the first parent's allele. (D) CNVs in the HB3(2) and Dd2 parental clones; γ labels the segment amplified in Dd2. Note that the HB3(2) clone sequenced here appears to be a mixture, with a minor proportion of parasites carrying the amplification visible in HB3(1). (E) CNV and recombination in clone CH3_61, progeny of HB3 × Dd2. AB = fraction of aligned reads containing the first parent's allele. (F) CNVs in the 7G8 and GB4 parental clones. CN = copy number; markers show normalized read counts within 300-bp nonoverlapping windows, excluding windows where GC content was below 20%; solid black line is the copy number predicted by fitting a Gaussian hidden Markov model to the coverage data (Supplemental Information). DP = depth of coverage (number of aligned reads), FA = reads aligned facing away from each other (expected at boundaries of a tandem array), SS = reads aligned in the same orientation (expected at boundaries of a tandem inversion).

Similar articles

Cited by

References

    1. Anderson TJC, Patel J, Ferdig MT. 2009. Gene copy number and malaria biology. Trends Parasitol 25: 336–343. - PMC - PubMed
    1. Ariey F, Witkowski B, Amaratunga C, Beghain J, Langlois A-C, Khim N, Kim S, Duru V, Bouchier C, Ma L, et al. 2014. A molecular marker of artemisinin-resistant Plasmodium falciparum malaria. Nature 505: 50–55. - PMC - PubMed
    1. Ashley EA, Dhorda M, Fairhurst RM, Amaratunga C, Lim P, Suon S, Sreng S, Anderson JM, Mao S, Sam B, et al. 2014. Spread of artemisinin resistance in Plasmodium falciparum malaria. N Engl J Med 371: 411–423. - PMC - PubMed
    1. Baudat F, de Massy B. 2007. Regulating double-stranded DNA break repair towards crossover or non-crossover during mammalian meiosis. Chromosome Res 15: 565–577. - PubMed
    1. Bopp SER, Manary MJ, Bright AT, Johnston GL, Dharia NV, Luna FL, McCormack S, Plouffe D, McNamara CW, Walker JR, et al. 2013. Mitotic evolution of Plasmodium falciparum shows a stable core genome but recombination in antigen families. PLoS Genet 9: e1003293. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources