Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jan;145(1):71-84.
doi: 10.1017/S0031182017001329. Epub 2017 Jul 19.

PacBio assembly of a Plasmodium knowlesi genome sequence with Hi-C correction and manual annotation of the SICAvar gene family

Affiliations

PacBio assembly of a Plasmodium knowlesi genome sequence with Hi-C correction and manual annotation of the SICAvar gene family

S A Lapp et al. Parasitology. 2018 Jan.

Abstract

Plasmodium knowlesi has risen in importance as a zoonotic parasite that has been causing regular episodes of malaria throughout South East Asia. The P. knowlesi genome sequence generated in 2008 highlighted and confirmed many similarities and differences in Plasmodium species, including a global view of several multigene families, such as the large SICAvar multigene family encoding the variant antigens known as the schizont-infected cell agglutination proteins. However, repetitive DNA sequences are the bane of any genome project, and this and other Plasmodium genome projects have not been immune to the gaps, rearrangements and other pitfalls created by these genomic features. Today, long-read PacBio and chromatin conformation technologies are overcoming such obstacles. Here, based on the use of these technologies, we present a highly refined de novo P. knowlesi genome sequence of the Pk1(A+) clone. This sequence and annotation, referred to as the 'MaHPIC Pk genome sequence', includes manual annotation of the SICAvar gene family with 136 full-length members categorized as type I or II. This sequence provides a framework that will permit a better understanding of the SICAvar repertoire, selective pressures acting on this gene family and mechanisms of antigenic variation in this species and other pathogens.

Keywords: Plasmodium knowlesi; SICAvar; Hi-C; MaHPIC; PacBio; annotation; antigenic variation; genome; sequence.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Hi-C assisted scaffolding of PacBio contigs. (A) Alignment of Hi-C data to the initial set of 35 high-coverage contigs by PacBio assembly showed that one of the contigs includes DNA from three different chromosomes as evidenced by the tri-partite structure of intracontig contact map of this contig (right). Other contigs did not exhibit similar contact patterns (representative example – left) suggesting they are contiguous pieces from a single chromosome. (B) Intercontig Hi-C contact maps of the unordered set of contigs (left) that were named according to their similarity with chromosomes in the PKNH assembly show striking off-diagonal contact enrichment suggesting that pairs of contigs that belong to the same chromosome are not ordered consecutively. Similar intercontig maps when contigs are clustered into scaffolds according to their Hi-C contact counts (mid) show minimal off-diagonal enrichment. Interchromosomal/scaffold contact map generated by aligning Hi-C reads to the new, chromosome level assembly (right) exhibits contact patterns that are expected of and observed in Plasmodium and yeast species (Ay et al. ; Duan et al. 2010). This assembly was generated by breaking down the problematic contig, clustering contigs into chromosomal groups, and ordering and reorienting contigs within each group to maximize Hi-C contacts between adjacent and correctly oriented contigs to create scaffolds representative of each chromosome. (C) Intrascaffold Hi-C contact maps (normalized counts, 10 kb resolution) from two representative scaffolds in the new assembly. Scaffold 6 (left) and scaffold 14 were constructed by joining two and four PacBio contigs, respectively. The rows/columns marked by white represent unmappable or poorly mappable regions with Hi-C reads (Illumina 76 × 2 bp, paired-end sequencing).
Fig. 2.
Fig. 2.
Chromosomal synteny between PKNH and the MaHPIC PKNOH genome sequences. (A) SyMAP circular DNA comparison of the MaHPIC Pk genome sequence scaffolds to the PKNH 2015 consensus sequence. (B) SyMAP circular DNA comparison of the MaHPIC Pk genome sequence scaffolds to the Plasmodium coatneyi HACKERI genome sequence that was assembled using PacBio technologies (Chien et al. 2016). (C) SyMAP circular DNA comparison of the PKNH 2015 consensus sequence and P. coatneyi genome sequence.
Fig. 3.
Fig. 3.
Hi-C contact maps for the join regions present on scaffolds 8 and 9. Hi-C contact maps of two scaffolds from the PKNOH-PacBio-Hi-C assembly that contain contigs previously assigned to two different chromosomes in the PKNH assembly. These contact maps are zoomed in to the join regions and are at the single MboI restriction fragment level (~1 kb in resolution). Each heatmap is rotated 45 degrees compared with previous intracontig/scaffold heatmaps for visualization purposes. (A) The 200 kb region of scaffold 8 (scf8:500 000– 700 000) that surrounds the join (at scf8:593 400) between two contigs previously assigned to chr13 and chr4 (left) compared with a matched 200 kb region from scaffold 12, which consists of a single contiguous PacBio contig (right). (B) Similar case vs control figure for scaffold 9 compared with matched coordinates in scaffold 5. The dashed blue lines correspond to location of the join (or matching coordinates on the right) and the sum and average number (excluding zeros) of interactions between the left and right (rectangular area) of a join are reported for each case.
Fig. 4.
Fig. 4.
SICAvar distribution and gene models. (A) Shown are representative examples of types I and II SICAvar genes with exons noted in blue, and their directionality indicated with arrow heads placed at the end of the 3-prime exons. Type I SICAvar genes are characterized by multiple exons (5–16), often with extremely large introns, particularly between exons 2 and 3. Type II SICAvar genes have three or four exons and are more compact with smaller introns. In five of the six examples shown, the initial two exons shown are typical. (B) Distribution of full SICAvar genes (types I and II) along the PKNOH scaffolds. (C) Distribution of partial SICAvar gene segments (types I and II) along the PKNOH scaffolds.

Similar articles

Cited by

References

    1. Ahmed M. A. and Cox-Singh J. (2015). Plasmodium knowlesi – an emerging pathogen. ISBT Science Series 10, 134–140. - PMC - PubMed
    1. al-Khedery B., Barnwell J. W. and Galinski M. R. (1999). Antigenic variation in malaria: a 3′ genomic alteration associated with the expression of a P. knowlesi variant antigen. Molecular Cell 3, 131–141. - PubMed
    1. Assefa S., Lim C., Preston M. D., Duffy C. W., Nair M. B., Adroub S. A., Kadir K. A., Goldberg J. M., Neafsey D. E., Divis P., Clark T. G., Duraisingh M. T., Conway D. J., Pain A. and Singh B. (2015). Population genomic structure and adaptation in the zoonotic malaria parasite Plasmodium knowlesi. Proceedings of the National Academy of Sciences of the USA 112, 13027–13032. - PMC - PubMed
    1. Aurrecoechea C., Barreto A., Basenko E. Y., Brestelli J., Brunk B. P., Cade S., Crouch K., Doherty R., Falke D., Fischer S., Gajria B., Harb O. S., Heiges M., Hertz-Fowler C., Hu S., Iodice J., Kissinger J. C., Lawrence C., Li W., Pinney D. F., Pulman J. A., Roos D. S., Shanmugasundram A., Silva-Franco F., Steinbiss S., Stoeckert C. J. Jr., Spruill D., Wang H., Warrenfeltz S. and Zheng J. (2017). EuPathDB: the eukaryotic pathogen genomics database resource. Nucleic Acids Research 45(D1), D581–D591. - PMC - PubMed
    1. Ay F., Bailey T. L. and Noble W. S. (2014a). Statistical confidence estimation for Hi-C data reveals regulatory chromatin contacts. Genome Res 24(6), 999–1011. doi: 10.1101/gr.160374.113. - DOI - PMC - PubMed

Publication types

MeSH terms