Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 May 3:3:52.
doi: 10.12688/wellcomeopenres.14571.1. eCollection 2018.

Long read assemblies of geographically dispersed Plasmodium falciparum isolates reveal highly structured subtelomeres

Affiliations

Long read assemblies of geographically dispersed Plasmodium falciparum isolates reveal highly structured subtelomeres

Thomas D Otto et al. Wellcome Open Res. .

Abstract

Background: Although thousands of clinical isolates of Plasmodium falciparum are being sequenced and analysed by short read technology, the data do not resolve the highly variable subtelomeric regions of the genomes that contain polymorphic gene families involved in immune evasion and pathogenesis. There is also no current standard definition of the boundaries of these variable subtelomeric regions. Methods: Using long-read sequence data (Pacific Biosciences SMRT technology), we assembled and annotated the genomes of 15 P. falciparum isolates, ten of which are newly cultured clinical isolates. We performed comparative analysis of the entire genome with particular emphasis on the subtelomeric regions and the internal var genes clusters. Results: The nearly complete sequence of these 15 isolates has enabled us to define a highly conserved core genome, to delineate the boundaries of the subtelomeric regions, and to compare these across isolates. We found highly structured variable regions in the genome. Some exported gene families purportedly involved in release of merozoites show copy number variation. As an example of ongoing genome evolution, we found a novel CLAG gene in six isolates. We also found a novel gene that was relatively enriched in the South East Asian isolates compared to those from Africa. Conclusions: These 15 manually curated new reference genome sequences with their nearly complete subtelomeric regions and fully assembled genes are an important new resource for the malaria research community. We report the overall conserved structure and pattern of important gene families and the more clearly defined subtelomeric regions.

Keywords: Long read Assembly; Plasmodium falciparum; complete genomes; definition of core genome.

PubMed Disclaimer

Conflict of interest statement

No competing interests were disclosed.

Figures

Figure 1.
Figure 1.. Frameshifts in the Rh2b gene that could not be corrected in the Pf3D7 PacBio assembly.
Most of the Rh2b sequence is identical to Rh2a. Read correction approaches like quiver or iCORN2 fail resulting in seven apparent frameshifts (arrows).
Figure 2.
Figure 2.. Summary for evidence for Plasmodium core definition.
From outside to inside the figure represents: heterochromatin protein 1 binding sites in Pf3D7; the coverage of mapped reads from PacBio genomes used to define the core region; our new definition of core regions (green) and the subtelomeres or internal var gene cluster (orange); one-to-one orthologues with P. falciparum (black lines) or P. vivax (grey lines). The dots in this latter track are var genes (≥2.5kb) coloured blue if on the forward strand, red if on the reverse strand, and black if a pseudo gene. Chromosome numbers are shown together with orange bars –the height of which indicates the proportion of subtelomeres in the PacBio assemblies that were complete. The innermost track shows differences between the 3D7 reference genome and the PacBio assemblies: green depicts sites where all isolates are different to the reference, orange shows insertions, black shows 1 bp differences and blue depicts insertions after masking homopolymer tracks and TA repeats >14bp.
Figure 3.
Figure 3.. Definition of the core genome.
( A) 15 assemblies (GN01, CD01, DD2, KE01, KH01, KH02, 7G8, GA01, GB4, IT, SD01, HB3, SN01, ML01, TG01) mapped to the left hand side of chromosome 11 (chr11) of P. falciparum 3D7. A general core boundary is being defined as the point closest to a telomere at which 8/15 genotypes cease to align (50% percentile). ( B) Cumulative distributions of termination positions of mapping between each isolate and the 3D7 reference are shown for each chromosome end (L = left, R = right). Where mapping terminated without clipping the isolate was excluded.
Figure 4.
Figure 4.. Example of core/subtelomere definitions for chromosomes 12 and 14.
The grey areas represent the boundaries of newly defined core regions. The order and orientation of genes and repeats on the left- and right-hand sides of chromosomes 12 and 14 are shown in P. falciparum 3D7 , 7G8, HB3, KH01 (Cambodia), KE01 (Kenya), GA01 (Gabon), GN01 (Guinea), CD01 (Congo), SN01 (Senegal), IT, Dd2, GB4, KH02 (Cambodia), and SD01 (Sudan) are shown. Genes are represented as coloured boxes. The subtelomere/core areas either end or start with the first gene, with orthologs in more than one Plasmodium species outside of P. falciparum. These are the following gene IDs in P. falciparum 3D7: PF3D7_1201500, PF3D7_1252300, PF3D7_1401700, PF3D7_1475900.
Figure 5.
Figure 5.. Overview of subtelomeres in new assemblies.
The figure shows the left and right subtelomere of 13 assemblies (GN01, CD01, DD2, KE01, KH01, KH02, 7G8, GA01, GB4, IT, SD01, HB3, SN01). The proportion of chromosomes with telomeric repeats (Tel) and REP20 repeats (REP20), the distribution of the number of PIR- and var-like sequences (including both genes and pseudogenes) and the presence of cassettes consisting of EPF1 (exported protein family 1), MC-2TM (MC-2TM Maurer's cleft two transmembrane protein), EPF3/Hyp4 (exported protein family 3) and EPF4/HYP5 (exported protein family 4) are shown (MC/EPF). PIR includes rif and stevor genes.
Figure 6.
Figure 6.. Similarity graph of exported proteins EPF1, EPF3, EPF4 and PfMC-2TM.
Each node represents a protein. Proteins are connected where they share ≥ 94% global identity.
Figure 7.
Figure 7.. Phylogenetic tree of Cytoadherence Linked Asexual Gene (CLAG) genes including the novel CLAG (new).
The new CLAGs are PfDC01_00009400, PfIT_060036000, PfHB3_100043500, Pf7G8_100043600, PfSN01_140007100, PfGA01_040029900 and Pf7G8_070006300. The tree was built with the LG4XF model and all nodes have a bootstraps value of 100. The branch with * was shortened by a factor of 5, from 0.322.
Figure 8.
Figure 8.. Overview of internal var gene arrays in 13 Plasmodium falciparum genomes.
The order and orientation of genes on 7 internal var arrays in P. falciparum 3D7, 7G8, HB3, KH01 (Cambodia), KE01 (Kenya), GA01 (Gabon), GN01 (Guinea), CD01 (Congo), SN01 (Senegal), IT, Dd2, GB4, KH02 (Cambodia), and SD01 (Sudan) are shown. Genes are represented as coloured boxes.
Figure 9.
Figure 9.. KAHRP promoter copy number variation.
Comparisons of Plasmodium falciparum 3D7, CD01 (Congo), 7G8, SN01 (Senegal), KH01 (Cambodia), IT shows copy number variation in the KAHRP (knob-associated histidine-rich protein) promoter. ACT (Artemis Comparison Tool) comparison of chromosome 2 shows copy number variation in the KAHRP promoter. Grey bars represent the forward and reverse DNA strands. The red blocks between sequences represent sequence similarity (tBlastx). KAHRP is shown in blue. There is a difference in the number of copies: The number of copies is as follows: 1 copy (IT, DD2, GB4), 2 copies (GN01, KH01, KH02, SD01, HB3, 3D7), 3 copies (CD01, GA01), 4 copies (SN01) and 6 copies (7G8).
Figure 10.
Figure 10.. Comparisons of multiple isolates reveal a novel hypothetical gene on chromosome 11.
ACT (Artemis Comparison Tool) comparison of chromosome 11 of these isolates shows the position of this gene of unknown function where present (blue). Grey bars represent the forward and reverse DNA strands. The red blocks between sequences represent sequence similarity (tBlastx). The gene is present in Dd2 (PfDd2_110027900), HB3 (PfHB3_110028200), 7G8 (Pf7G8_110028600), KH01 (PfKH01_110029000), KH02 (PfKH02_110030000), IT (PfIT_110029200), but in none of the African isolates.

References

    1. WHO: Fact Sheet: World Malaria Day 2016.2016. Reference Source
    1. Gardner MJ, Hall N, Fung E, et al. : Genome sequence of the human malaria parasite Plasmodium falciparum. 2002;419(6906):498–511. 10.1038/nature01097 - DOI - PMC - PubMed
    1. Bruske EI, Dimonte S, Enderes C, et al. : In Vitro Variant Surface Antigen Expression in Plasmodium falciparum Parasites from a Semi-Immune Individual Is Not Correlated with Var Gene Transcription. 2016;11(12):e0166135. 10.1371/journal.pone.0166135 - DOI - PMC - PubMed
    1. Trager W, Jensen JB: Human malaria parasites in continuous culture. 1976;193(4254):673–675. 10.1126/science.781840 - DOI - PubMed
    1. Roberts DJ, Craig AG, Berendt AR, et al. : Rapid switching to multiple antigenic and adhesive phenotypes in malaria. 1992;357(6380):689–692. 10.1038/357689a0 - DOI - PMC - PubMed