Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 May 28:4:58.
doi: 10.12688/wellcomeopenres.15194.2. eCollection 2019.

Progression of the canonical reference malaria parasite genome from 2002-2019

Affiliations

Progression of the canonical reference malaria parasite genome from 2002-2019

Ulrike Böhme et al. Wellcome Open Res. .

Abstract

Here we describe the ways in which the sequence and annotation of the Plasmodium falciparum reference genome has changed since its publication in 2002. As the malaria species responsible for the most deaths worldwide, the richness of annotation and accuracy of the sequence are important resources for the P. falciparum research community as well as the basis for interpreting the genomes of subsequently sequenced species. At the time of publication in 2002 over 60% of predicted genes had unknown functions. As of March 2019, this number has been significantly decreased to 33%. The reduction is due to the inclusion of genes that were subsequently characterised experimentally and genes with significant similarity to others with known functions. In addition, the structural annotation of genes has been significantly refined; 27% of gene structures have been changed since 2002, comprising changes in exon-intron boundaries, addition or deletion of exons and the addition or deletion of genes. The sequence has also undergone significant improvements. In addition to the correction of a large number of single-base and insertion or deletion errors, a major miss-assembly between the subtelomeres of chromosome 7 and 8 has been corrected. As the number of sequenced isolates continues to grow rapidly, a single reference genome will not be an adequate basis for interpreting intra-species sequence diversity. We therefore describe in this publication a population reference genome of P. falciparum, called Pfref1. This reference will enable the community to map to regions that are not present in the current assembly. P. falciparum 3D7 will continue to be maintained, with ongoing curation ensuring continual improvements in annotation quality.

Keywords: Plasmodium; annotation; curation; falciparum; genome; reference.

PubMed Disclaimer

Conflict of interest statement

No competing interests were disclosed.

Figures

Figure 1.
Figure 1.. Distribution of genes with structural changes and new genes on chromosomes 1 to 14 of P. falciparum 3D7.
The positions of new genes (shown in red), genes that have undergone structural changes (shown in blue) and genes that stayed the same since 2002 (shown in grey) are shown on the 14 chromosomes. The values along the right of each chromosome indicate the total sequence length in base pairs. Genes above the chromosome lines are located on the forward strand, genes below the chromosome lines are on the reverse strand.
Figure 2.
Figure 2.. Gene structure changes.
Artemis Comparison Tool (ACT) screenshot showing a comparison between 2002 and 2019. Coloured boxes represent genes. The grey blocks between sequences represent sequence similarity (TBLASTX). ( A) A 2-exon gene has been changed into a 22-exon gene (PF3D7_1462500) ( B) Two genes that have been merged (PF3D7_0624900) ( C) A gene that has been split into two genes (PF3D7_0906800, PF3D7_0906700) ( D) Two genes shown in red have been added (PF3D7_1144100, mitochondrial large subunit ribosomal protein; PF3D7_1144300, 60S ribosomal protein L41) ( E) A hypothetical gene (PFI0905w) has been deleted and a ncRNA (PF3D7_0918500, telomerase RNA) has been added. In (E), the six reading frames are shown with tick marks indicating stop codons.
Figure 3.
Figure 3.. Diagram showing gene structure changes.
Number of genes that have been added, deleted or changed are shown over four different time frames: October 2002 (genome version 1) and 2005 (genome version 2), between 2005 (version 2) and September 2007 (version 2.1.4), between September 2007 (genome version 2.1.4) and February 2010 (version 2.1.4) and between February 2010 and March 2019 (version 3.2). The number of changed genes includes gene models that have been merged, split or had a deletion/addition of exons or change of exon boundaries.
Figure 4.
Figure 4.. P. falciparum 3D7 annotation changes between October 2002 and March 2019.
The number of genes between October 2002 and March 2019 are compared. The total number of genes includes pseudogenes. The number of genes with unknown function is shown (blue), genes with experimental evidence (red), genes with putative function (yellow) and the complete number of genes (light blue). Genes with unknown function have the following product description: conserved Plasmodium protein, unknown function; conserved protein, unknown function; conserved Plasmodium membrane protein, unknown function; Plasmodium exported protein, unknown function; probable protein, unknown function; hypothetical protein.
Figure 5.
Figure 5.. Diagram showing different types of patches created for the P. falciparum 3D7 population reference (PfRef1).
Type-1 are sequence differences between the current P. falciparum 3D7 assembly version 3.2 and a new Pf3D7 PacBio assembly. 500 bp are provided on each side as anchor (shown in blue). Type-2 are new genes, that are either anchored on both sides (type 2.1), or not anchored (type 2.2). Type-3 are dimorphic genes that are either anchored on both sides (type 3.1) or anchored on one side (type 3.2).
Figure 6.
Figure 6.. Differences between P. falciparum 3D7 genome version 3.2, a PacBio assembly of P. falciparum 3D7 and two lab strains P. falciparum IT and DD2.
ACT comparisons between regions of the above genomes. Coloured boxes represent genes. The red blocks between sequences represent sequence similarity (BLASTn). ( A) In the current P. falciparum 3D7 genome assembly v3.2, a hypothetical protein on chromosome 13 is missing. This gene is present in a Pf3D7 PacBio assembly (shown in green). ( B) Comparison between P. falciparum IT chromosome 11 and P. falciparum v3.2 chromosome 11. P. falciparum 3D7 is missing a hypothetical gene on chr11. This gene is present in P. falciparum IT (PfIT_110029300) (shown in green). ( C) Comparison between Pf3D7 v3.2 chromosome 7 and P. falciparum DD2 chromosome 7. The comparison shows the dimorphic gene EBA175 erythrocyte binding antigen-175 (PF3D7_0731500). ( D) Clustalx alignment of EBA175 from PfDD2 and Pf3D7. The area shown is the dimorphic part of the two genes.

References

    1. Ashburner M, Ball CA, Blake JA, et al. : Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25(1):25–29. 10.1038/75556 - DOI - PMC - PubMed
    1. Berry AE, Gardner MJ, Caspers GJ, et al. : Curation of the Plasmodium falciparum genome. Trends Parasitol. 2004;20(12):548–552. 10.1016/j.pt.2004.09.003 - DOI - PubMed
    1. Böhme U: Progression of the canonical reference malaria parasite genome from 2002–2019.2019. 10.17605/OSF.IO/5K9VJ - DOI - PMC - PubMed
    1. Böhme U, Otto TD, Cotton JA, et al. : Complete avian malaria parasite genomes reveal features associated with lineage-specific evolution in birds and mammals. Genome Res. 2018;28(4):547–560. 10.1101/gr.218123.116 - DOI - PMC - PubMed
    1. Briquet S, Ourimi A, Pionneau C, et al. : Identification of Plasmodium falciparum nuclear proteins by mass spectrometry and proposed protein annotation. PLoS One. 2018;13(10):e0205596. 10.1371/journal.pone.0205596 - DOI - PMC - PubMed