Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 May;31(5):834-851.
doi: 10.1101/gr.262816.120. Epub 2021 Apr 27.

Third-generation sequencing revises the molecular karyotype for Toxoplasma gondii and identifies emerging copy number variants in sexual recombinants

Affiliations

Third-generation sequencing revises the molecular karyotype for Toxoplasma gondii and identifies emerging copy number variants in sexual recombinants

Jing Xia et al. Genome Res. 2021 May.

Abstract

Toxoplasma gondii is a useful model for intracellular parasitism given its ease of culture in the laboratory and genomic resources. However, as for many other eukaryotes, the T. gondii genome contains hundreds of sequence gaps owing to repetitive and/or unclonable sequences that disrupt the assembly process. Here, we use the Oxford Nanopore Minion platform to generate near-complete de novo genome assemblies for multiple strains of T. gondii and its near relative, N. caninum We significantly improved T. gondii genome contiguity (average N50 of ∼6.6 Mb) and added ∼2 Mb of newly assembled sequence. For all of the T. gondii strains that we sequenced (RH, ME49, CTG, II×III progeny clones CL13, S27, S21, S26, and D3X1), the largest contig ranged in size between 11.9 and 12.1 Mb in size, which is larger than any previously reported T. gondii chromosome, and found to be due to a consistent fusion of Chromosomes VIIb and VIII. These data were validated by mapping existing T. gondii ME49 Hi-C data to our assembly, providing parallel lines of evidence that the T. gondii karyotype consists of 13, rather than 14, chromosomes. By using this technology, we also resolved hundreds of tandem repeats of varying lengths, including in well-known host-targeting effector loci like rhoptry protein 5 (ROP5) and ROP38 Finally, when we compared T. gondii with N. caninum, we found that although the 13-chromosome karyotype was conserved, extensive, previously unappreciated chromosome-scale rearrangements had occurred in T. gondii and N. caninum since their most recent common ancestry.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Primary de novo assembly of TgRH88 genome using Nanopore reads revises T. gondii karyotype. (A) Bivariate plot showing a comparison of the aligned read length with the sequenced read length. (B) Bivariate plot showing a comparison of the aligned–corrected read length (log10-transformed) with the percentage identity. In this case, corrected reads refer to the method deployed by Canu using read overlap. (C) Histogram showing comparison of chromosome size between the ToxoDB-48_TgGT1 genome and TgRH88 initial long-read assembly. (D) Inter-chromosomal Hi-C contact-count heat map plotted using the TgRH88 initial long-read assembly sequence showing 13 chromosomes in the assembly. (E) Intra-chromosomal Hi-C contact-count heat map plotted using the sequence of TgRH88_tig00000001 in TgRH88 initial long-read assembly showing no aberrant signal along the contig.
Figure 2.
Figure 2.
Comparisons between the current Nanopore assembly of T. gondii strain RH88 and existing long-read (using PacBio RS and Illumina technology) and short-read (Illumina only) assemblies. (A) Assembly statistics for each. (B) Circos plot of NUCmer pairwise alignments across all three assemblies for T. gondii Chromosome IV. All alignments >10,000 bp and >90% identity are shown. (C) Pairwise alignments for Chromosomes IV (top) and VI (bottom) along with locations of select tandem repeats identified either de novo (orange or blue bars on the chromosome scaffolds) or known from prior studies (above or below chromosome scaffolds). For T. gondii Chromosome IV, comparisons to both the PacBio and Illumina assemblies are shown, whereas only the long-read comparison is shown for T. gondii Chromosome VI.
Figure 3.
Figure 3.
Long-read assembly identifies 13 chromosomes in the T. gondii genome from multiple strains. (A) Dot plot showing the comparison of the TgME49 long-read assembly and the ToxoDB-48_TgME49 genome. Red box shows that the Chromosomes VIIb and VIII in the ToxoDB-48_TgME49 genome are fused in a single contig, TgME49_tig00000001_ChrVIII, in the TgME49 long-read assembly. (B) Coverage of the “breakpoint” (TgME49_tig00000001_ChrVIII: 5,090,422 bp, indicated by a vertical red line) of Chromosomes VIIb and VIII with 37 Nanopore reads in the TgME49 long-read assembly. (C) Coverage of the edges (indicated by a vertical red line) of Chromosomes VIIb and VIII with 105 Nanopore reads mapped to the ToxoDB-48_TgME49 genome. (D) Nanopore reads mapping to the end of Chromosomes IX and X in the ToxoDB-48_TgME49 genome assembly, showing that Nanopore reads only map to the end of each chromosome and do not span the junction between these chromosomes (indicated by a vertical red line). (E) Inter-chromosomal Hi-C contact-count heat map plotted using the TgME49 initial long-read assembly sequence showing 13 chromosomes in the assembly. (F) Intra-chromosomal Hi-C contact-count heat map plotted using the sequence of TgME49_tig00000001 in the TgME49 initial long-read assembly showing no aberrant signal along the contig.
Figure 4.
Figure 4.
Long-read assembly reveals previously unknown inversions and the centromere location on Chr IV in T. gondii. (A) Inversion in the RH88 long-read assembly on Chromosome III relative to the ToxoDB-48_TgGT1 assembly. (B) Inversion in the ME49 long-read assembly on Chromosome XII relative to the ToxoDB-44_TgME49 genome. (C) Dot plot comparison of the TgRH88 long-read assembly and the ToxoDB-48_TgGT1 genome showing a 429.3-kb inversion at 2,096,529–2,525,795 bp on Chr IV. (D) Intra-chromosomal Hi-C contact-count heat map plotted using the sequence of tig00000014 in TgRH88 long-read assembly showing a clear centromere signal at position 2.2–2.3 Mb. (E) ChIP-on-chip signal of centromeric histone 3 variant (CenH3) (Brooks et al. 2011) plotted using the TgRH88 long-read assembly as coordinate.
Figure 5.
Figure 5.
Long-read sequence assemblies precisely resolve canonical repeat sequences and identify additional expansions at gene-harboring loci. (AD) Estimated copy number for Nanopore assemblies and existing genome assemblies on ToxoDB (“v48”) for T. gondii strain types I, II, and III and II×III F1 progeny. In all cases, Nanopore assemblies identified higher numbers of each repeat locus. In the F1 progeny, B1 gene copy number tracked directly with the genotype (type II or III) at that locus (A), whereas these same F1 progeny harbored unique numbers of 529-bp repeat copies, all of which were not only distinct from their respective genotypes of origin but distinct from one another (B). The TgIRE and SAT350 repeats also were better resolved in our Nanopore assemblies (C,D), although determining genotype of the corresponding region is not possible because these repeats are found at multiple locations throughout the genome. (E) Whole-chromosome alignment focused on the 529-bp repeat region for the v48 ToxoDB assembly (bottom) and our Nanopore-based assembly (top). Expansion of the known genome sequence at this locus in the Nanopore sequence compared with the ToxoDB assembly is clear, as well as consistent with our identification of approximately 140 previously unknown 529-bp repeats in the ME49 genome. (F) Alignment and annotation of repeat sequences of ME49 v48 ToxoDB Chromosome IV and that from our polished Nanopore assembly. Gray bars with red borders indicate mapping regions ≥10,000 bp determined using NUCmer, whereas orange boxes with blue borders indicate tandem repeats with period sizes ≥500 bp and at least two copies. Bars that appear orange are larger than those that are only blue. Incorrect inversion on the right arm of Chromosome IV in the ToxoDB assembly is evident, as is the more accurately resolved 529-bp repeat locus, which was likely a cause for the inversion in standard assemblies from multiple strains.
Figure 6.
Figure 6.
Long-read assembly resolves duplicated locus structure in T. gondii genome. (A) Two unresolved scaffold gaps on Chr Ia in the ToxoDB-48_TgME49 genome span a 17.5-kb tandem repeat containing multiple copies of ROP4 and ROP7. The ROP4/7 gaps are closed by the TgME49 long-read assembly (TgME49_tig00000028), revealing a tandem array of five copies of this gene in the order shown. (B) BLASTN alignment of the ROP4/ROP7 coding sequence in the ToxoDB-48_TgME49 genome (upper panel) and the TgME49 long-read assembly (lower panel). (C) Copy number determination at six canonical tandem gene arrays across eight T. gondii strains and one N. caninum strain. Data from CL13, S27, S21, and S26 show that copy number can change during sexual recombination because the copy number in these F1 progeny clones does not match copy number in either parent. (DF) Whole-chromosome alignments between ME49ToxoDB-48 and our Nanopore assemblies at loci harboring tandem gene arrays. Gray boxes with red borders indicate one-to-one mapping regions ≥10,000 bp determined by NUCmer, and orange/blue boxes are as described in Figure 5. Black bars indicate size of select tandem repeats in the ToxoDB and Nanopore assemblies.
Figure 7.
Figure 7.
Error polishing with Pilon is effective for most single-copy genes, but resolution of tandem gene expansion errors requires supplemental correction. (A) We identified 1121 single-exon genes with predicted proteins that mapped with 100% identity and 100% coverage using TBLASTN against the ToxoDB-48 ME49 genome. Then, we mapped these against the raw assembly generated by Canu, the polished assembly generated by four rounds of Pilon, and after four rounds of supplemental error correction. Pilon error correction was sufficient for perfect mapping of 94% of the query single-exon genes (compared with only 0.4% for the raw Canu assembly), and supplemental error correction only increased this mapping percentage slightly. (B) Plots representing TBLASTN analysis of protein sequences from two single-copy genes showing the improved mapping achieved by Pilon-based error correction. Mapping identity is indicated by the color of the box representing the alignment. (C,D) Plots representing protein-coding sequences from the ROP5 (C) or ROP38 (D) gene mapped using TBLASTN against the raw Canu-only assembly, the Pilon-corrected assembly, and the region corrected using our supplemental approach tailored to tandem gene arrays. Both loci have multiple pseudogenes in the Canu-only and Canu-plus Pilon assemblies, but many of these errors are removed upon supplemental correction. The presence of a pseudogene in the ME49 ROP5 locus has been predicted before based on direct sequencing, suggesting that this may represent the most accurate version of the ME49 ROP5 locus sequenced to date.
Figure 8.
Figure 8.
Long-read assembly reveals N. caninum karyotype and its synteny with T. gondii. (A) Circos plot showing high synteny between the TgRH88 long-read assembly and the ToxoDB-48_TgGT1 genome. (B) Circos plot showing the chromosomal translocations and inversions in NcLiv long-read assembly compared with the ENA_NcLiv genome. (C) Circos plot showing the syntenic relationship between TgRH88 and NcLiv long-read assembly.

References

    1. Adomako-Ankomah Y, Wier GM, Borges AL, Wand HE, Boyle JP. 2014. Differential locus expansion distinguishes Toxoplasmatinae species and closely related strains of Toxoplasma gondii. mBio 5: e01003-13. 10.1128/mBio.01003-13 - DOI - PMC - PubMed
    1. Adomako-Ankomah Y, English ED, Danielson JJ, Pernas LF, Parker ML, Boulanger MJ, Dubey JP, Boyle JP. 2016. Host mitochondrial association evolved in the human parasite Toxoplasma gondii via neofunctionalization of a gene duplicate. Genetics 203: 283–298. 10.1534/genetics.115.186270 - DOI - PMC - PubMed
    1. Benson G. 1999. Tandem Repeats Finder: a program to analyze DNA sequences. Nucleic Acids Res 27: 573–580. 10.1093/nar/27.2.573 - DOI - PMC - PubMed
    1. Berná L, Marquez P, Cabrera A, Greif G, Francia ME, Robello C. 2021. Reevaluation of the Toxoplasma gondii and Neospora caninum genomes reveals misassembly, karyotype differences, and chromosomal rearrangements. Genome Res (this issue) 31: 823–833. 10.1101/gr.262832.120 - DOI - PMC - PubMed
    1. Blank ML, Boyle JP. 2018. Effector variation at tandem gene arrays in tissue-dwelling coccidia: Who needs antigenic variation anyway? Curr Opin Microbiol 46: 86–92. 10.1016/j.mib.2018.09.001 - DOI - PMC - PubMed

Publication types