Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 May;4(5):e000177.
doi: 10.1099/mgen.0.000177. Epub 2018 Apr 30.

Expanding an expanded genome: long-read sequencing of Trypanosoma cruzi

Affiliations

Expanding an expanded genome: long-read sequencing of Trypanosoma cruzi

Luisa Berná et al. Microb Genom. 2018 May.

Abstract

Although the genome of Trypanosoma cruzi, the causative agent of Chagas disease, was first made available in 2005, with additional strains reported later, the intrinsic genome complexity of this parasite (the abundance of repetitive sequences and genes organized in tandem) has traditionally hindered high-quality genome assembly and annotation. This also limits diverse types of analyses that require high degrees of precision. Long reads generated by third-generation sequencing technologies are particularly suitable to address the challenges associated with T. cruzi's genome since they permit direct determination of the full sequence of large clusters of repetitive sequences without collapsing them. This, in turn, not only allows accurate estimation of gene copy numbers but also circumvents assembly fragmentation. Here, we present the analysis of the genome sequences of two T. cruzi clones: the hybrid TCC (TcVI) and the non-hybrid Dm28c (TcI), determined by PacBio Single Molecular Real-Time (SMRT) technology. The improved assemblies herein obtained permitted us to accurately estimate gene copy numbers, abundance and distribution of repetitive sequences (including satellites and retroelements). We found that the genome of T. cruzi is composed of a 'core compartment' and a 'disruptive compartment' which exhibit opposite GC content and gene composition. Novel tandem and dispersed repetitive sequences were identified, including some located inside coding sequences. Additionally, homologous chromosomes were separately assembled, allowing us to retrieve haplotypes as separate contigs instead of a unique mosaic sequence. Finally, manual annotation of surface multigene families, mucins and trans-sialidases allows now a better overview of these complex groups of genes.

Keywords: Chagas disease; PacBio; Trypanosoma cruzi; whole genome sequencing.

PubMed Disclaimer

Conflict of interest statement

The authors declare that there are no conflicts of interest.

Figures

Fig. 1.
Fig. 1.
Chromosomal assembly improvements. (a) ACT alignment of homologous chromosomes from three strains: TCC (contig TCC_10), Dm28c (contig Dm28c_6) and CL Brener (chromosome TcChr30-P). Previously undetermined sequences filled by Ns in CL Brener are marked in green. (b) Magnification of a fragment of a (boxed and shadowed in grey). The six frames and the DNA G+C content of each chromosome are plotted. Previously collapsed repetitive sequences (boxed in orange) are disaggregated in the new assembly. c) Visualization of the alignment of the same homologous chromosome showing additional details in TCC and Dm28c. The color patterns in the annotation bars (bottom and top-most horizontal stripped bars) correspond to the annotation as they appear in the web interface (DGF1 in red, GP63 in orange, RHS in brown, conserved genes in green). The six reading frames are also shown. (1) Terminal DGF-1 gene cluster present only in TCC. (2) Non-homologous region present only in Dm28c. (3) Repetitive region present in both strains. (4) Expansion of a GP63 cluster in TCC (four copies versus two copies in Dm28c). (5) Strain-specific amplifications of two different genes. There are seven GP63 copies (orange strips on the top annotation bar) in TCC but only one in Dm28c; moreover Dm28c contains four RHS copies in the same region. (6) Repetitive element present in both genomes having fewer copies in TCC (20 copies in TCC and 44 copies in Dm28c). The segment is followed by another strain-specific amplification consisting of a cluster of 14 GP63 genes in TCC and only one copy in Dm28c.
Fig. 2.
Fig. 2.
Haplotypes resolution and recombination. (a) Circos graph representation of homologous contigs (right). On the left is shown the Artemis view of the indicated fragments (for contig TCC_133 from 88 to 112 kb (top), and for contig TCC_64 from 50 to 77 kb (bottom)]. The six frames are shown and the annotated genes are represented in turquoise. (b) Alignment visualization (IGV) of the Esmeraldo Illumina reads (SRA833800) on the same homologous regions considered in (a) (TCC_133 on the top, TCC_64 on the bottom). (c) Alignment visualization (IGV) of PacBio TCC reads on the same region as in (b). On the bottom is represented the enlargement of the boxed region where Esmeraldo Illumina reads go from mapping to TCC_133 to mapping to TCC_64. (d) Circos graph representation of haplotype resolution contigs of different sizes.
Fig. 3.
Fig. 3.
The genome compartmentalization of T. cruzi. (a) Schematic representation of the two types of compartment in T. cruzi. Genes are visualized as in the web interface by strips (DGF1 red, GP63 orange, MASP blue, mucin light blue, TS light orange, conserved genes green). The core compartment is composed of conserved genes. The disruptive compartment is composed of surface multigene families TS, MASP and mucins. GP63, DGF-1 and RHS are distributed (sometimes in tandem clusters) in both compartments. (b) GC distribution of the compartments. Only contigs entirely composed of one compartment (80 % or higher proportion of conserved genes or surface multigene families) and longer than 10 kb were considered. (c) Schematic representation of a contig of Dm28c; genes are depicted as in (a) and colour compartment as in (b). The GC distribution is calculated over a sliding windows of 7000 bp. Strand-switch regions are indicated above the GC plot by black vertical stripes.
Fig. 4.
Fig. 4.
Tandem gene organization. (a) Representation of three contigs of TCC as in the web interface where only conserved genes are shown (green strips). Groups of tandemly arrayed genes are highlighted; parentheses indicate the number of copies. (b) Graph representation of the number of groups of tandemly arrayed genes (represented tandem length from four to ten genes) in the different genome assemblies. TCC in green, Dm28c in violet, CL Brener in gray.
Fig. 5.
Fig. 5.
L1Tc phylogeny. Maximum-likelihood phylogeny of complete sequences of L1Tc. Elements from TCC in green, Dm28c in violet, SylvioX10/1 in light violet, CL Brener Esmeraldo-like in light violet grey, CL Brener non-Esmeraldo-like in b.

References

    1. Ivens AC, Peacock CS, Worthey EA, Murphy L, Aggarwal G, et al. The genome of the kinetoplastid parasite, Leishmania major. Science. 2005;309:436–442. doi: 10.1126/science.1112680. - DOI - PMC - PubMed
    1. Berriman M, Ghedin E, Hertz-Fowler C, Blandin G, Renauld H, et al. The genome of the African trypanosome Trypanosoma brucei. Science. 2005;309:416–422. doi: 10.1126/science.1112642. - DOI - PubMed
    1. El-Sayed NM, Myler PJ, Bartholomeu DC, Nilsson D, Aggarwal G, et al. The genome sequence of Trypanosoma cruzi, etiologic agent of Chagas disease. Science. 2005;309:409–415. doi: 10.1126/science.1112631. - DOI - PubMed
    1. El-Sayed NM, Myler PJ, Blandin G, Berriman M, Crabtree J, et al. Comparative genomics of trypanosomatid parasitic protozoa. Science. 2005;309:404–409. doi: 10.1126/science.1112181. - DOI - PubMed
    1. Kissinger JC. A tale of three genomes: the kinetoplastids have arrived. Trends Parasitol. 2006;22:240–243. doi: 10.1016/j.pt.2006.04.002. - DOI - PubMed

Publication types

MeSH terms