Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Jun 1:10:255.
doi: 10.1186/1471-2164-10-255.

Chromosome level assembly of the hybrid Trypanosoma cruzi genome

Affiliations

Chromosome level assembly of the hybrid Trypanosoma cruzi genome

D Brent Weatherly et al. BMC Genomics. .

Abstract

Background: In contrast to the essentially fully assembled genome sequences of the kinetoplastid pathogens Leishmania major and Trypanosoma brucei the assembly of the Trypanosoma cruzi genome has been hindered by its repetitive nature and the fact that the reference strain (CL Brener) is a hybrid of two distinct lineages. In this work, the majority of the contigs and scaffolds were assembled into pairs of homologous chromosomes based on predicted parental haplotype, inference from TriTryp synteny maps and the use of end sequences from T. cruzi BAC libraries.

Results: Ultimately, 41 pairs of chromosomes were assembled using this approach, a number in agreement with the predicted number of T. cruzi chromosomes based upon pulse field gel analysis, with over 90% (21133 of 23216) of the genes annotated in the genome represented. The approach was substantiated through the use of Southern blot analysis to confirm the mapping of BAC clones using as probes the genes they are predicted to contain, and each chromosome construction was visually validated to ensure sufficient evidence was present to support the organization. While many members of large gene families are incorporated into the chromosome assemblies, the majority of genes excluded from the chromosomes belong to gene families, as these genes are frequently impossible to accurately position.

Conclusion: Now assembled, these chromosomes bring T. cruzi to the same level of organization as its kinetoplastid relatives and have been used as the basis for the T. cruzi genome in TriTrypDB, a trypanosome database of EuPathDB. In addition, they will provide the foundation for analyses such as reverse genetics, where the location of genes and their alleles and/or paralogues is necessary and comparative genome hybridization analyses (CGH), where a chromosome-level view of the genome is ideal.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The 41 model chromosomes of T. cruzi. Because the CL Brener reference strain is a hybrid of the "non-Esmeraldo-like" and "Esmeraldo-like" lineages, each chromosome is comprised of 2 homologous chromosomes. These model chromosomes represent the consensus view of both haplotypes. Gene family members are depicted as non-blue colors; of note is the number of clusters of gene family members on the chromosomes, as well as in the artificially assembled contigs that were not assignable to individual chromosomes.
Figure 2
Figure 2
Visual Validation. a) The 1336 genes ("Gene Features"), 145 "Contigs", 26 "Scaffolds", and mapped BAC clones ("BAC Clone") are shown for the homologous chromosome pairs (the non-Esmeraldo-like haplotype are the "P" features and the Esmeraldo-like haplotype are the "S" features) of chromosome TcChr39. Where possible, contigs are aligned to place at least one pair of alleles in the same locus (red contigs). Lines between genes on the two homologous chromosomes indicate allelic synteny. BAC clones are color-coded according to the source library (blue = TARBAC, green = EPIFOS); note that only BAC clones that span scaffolds or whose ends map to opposite homologous chromosomes (light blue and light green) are shown. Black contigs are those that were not aligned with another on the opposite chromosome (either no homologous sequence present or the alignment could not be made). Finally, brown contigs indicate possible merged sequence from the two parental haplotypes, blue contigs denote sequences where one or more alleles exist on this chromosome but could not be aligned, and gray contigs indicate sequences where one or more alleles exist on a different chromosome altogether. b) The TriTryp synteny map of TcChr39 shows the regions of T. brucei ("Tb Chromosome Portion") and L. major ("Lm Chromosome Portion") for which at least 10 genes are syntenous with T. cruzi genes are shown. For each chromosome (i.e. Tc, Lm, Tb), the coding strand is shown ("Rev" for reverse strand and "For" for forward strand). Telomeric repeats were identified on the 3' end of this chromosome (yellow ellipses).
Figure 3
Figure 3
Validation of assemblies by Southern "dot" blot analysis. a) The top-most cartoon depicts overlapping BAC clones (light blue) which are predicted to span a chromosome (assembled from multiple scaffolds, dark blue). Genes (orange) are selected such that they link proposed contiguous scaffolds. These overlapping pairs of probes create a stair-stepping effect on the Southern blot if the hybridization results are as predicted (bottom), indicating that the linkage of the scaffolds, and thus the chromosome, is correct. b) TcChr20 was chosen for Southern blot validation. Overlapping BAC clones (grey) are selected such that they span most of the chromosome. Gene probes are shown in orange (ids shown are truncated forms of the "Tc" gene ids, i.e. 506581.10 denotes Tc00.1047053506851.10). While there are 11 previously published scaffolds linked on this chromosome, the red stars indicate a region containing many trans-sialidase genes whose repetitive nature must have hindered the joining of the 4 scaffolds that terminate in this region (2 large "P" scaffolds and 2 "S" scaffolds). c) For each BAC clone, both of the gene probes positively hybridize as predicted, creating the stair-step effect that confirms the linkage of the genes on the chromosome. BAC clones are labeled with their assigned ids from the CHORI library.
Figure 4
Figure 4
Differentiation of alleles from paralogues. a) The solid boxes highlight assembly issues with the current genome. The solid gray boxes (locus "A") show a helicase gene where 1 allele is fully sequenced on the P chromosome and the other is in 2 pieces on S. The solid brown boxes (locus "B") show a hypothetical protein with no fully sequenced copy: the 2 genes on the P chromosome and 2 genes on the S chromosome both reside at contig boundaries and should be merged. Finally, the solid blue boxes (locus "C") show a case of 2 copies (at least partially sequenced) on the P chromosome and 1 presumed fully sequenced copy on S of the same hypothetical protein. In all cases, the dotted line boxes indicate the predicted correct coding sequences. Note that the circled genes show other cases of gene truncations at contig boundaries. b) Alignment of 3 annotated genes in the A locus (a helicase gene). The A1 allele is full length, while A2 and A3 are small pieces which exist on the ends of contigs and are not fully sequenced. Note that A2 has near perfect identity with the N-terminus of A1, while A3 has perfect identity with the C-terminus suggesting that these are pieces of the same gene.
Figure 5
Figure 5
Cluster of large gene family members in the middle of TcChr33. The region from ~0.2 Mb to 0.5 Mb contains mostly gene family members (non-blue "Gene Features"), while on either side of the region are "core" regions with either hypothetical genes or those with an assigned putative function (light and dark blue "Gene Features"). The large number of spanning BAC clones linking the core regions and the cluster of gene family members substantiates the organization. However, it should be noted that these homologous chromosomes are likely different sizes. The BAC clones on the "S" chromosome that span the 120 kb gap in the gene family rich region (connected by dashed lines) are too long for the BAC libraries as shown (TARBAC: blue, avg. length 75 kb, EPIFOS: green, avg. length 35 kb). Alignment of the homologous chromosomes is a visual aid to maintain allelic synteny only.
Figure 6
Figure 6
Local inversion on T. cruzi TcChr11. a) The region on left-hand side of homologous chromosome P appears inverted relative to S, as the orientation of scaffold CH473345-1 was reversed relative to CH473447 in order to close the BAC clones on both homologous chromosomes (i.e if CH473345-1 was oriented according to allelic synteny, then the BAC clones on the P chromosome would be unclosed). b) BAC clones and probes were chosen for Southern blot analysis to validate this inversion. The left-hand side shows a diagram of the design of the blot (see Figure 4 for more details). The chromosome inversion is confirmed if BAC clones 15L22, 14F9, 3J3, and 5D6 contains genes G1–G7 (but not G8), but BAC clones 17N21, 8P12, 6L16, and 10F10 contain genes G1–G3 and G5–G8 (but not G4). The right-hand side of the figure shows the result of the analysis where the hybridization results confirm the inversion.

References

    1. Finishing the euchromatic sequence of the human genome. Nature. 2004;431:931–945. - PubMed
    1. Celniker SE, Wheeler DA, Kronmiller B, Carlson JW, Halpern A, Patel S, Adams M, Champe M, Dugan SP, Frise E, et al. Finishing a whole-genome shotgun: release 3 of the Drosophila melanogaster euchromatic genome sequence. Genome Biol. 2002;3:RESEARCH0079. - PMC - PubMed
    1. El-Sayed NM, Myler PJ, Bartholomeu DC, Nilsson D, Aggarwal G, Tran AN, Ghedin E, Worthey EA, Delcher AL, Blandin G, et al. The genome sequence of T rypanosoma cruzi, etiologic agent of Chagas disease. Science. 2005;309:409–415. - PubMed
    1. Berriman M, Ghedin E, Hertz-Fowler C, Blandin G, Renauld H, Bartholomeu DC, Lennard NJ, Caler E, Hamlin NE, Haas B, et al. The genome of the African trypanosome Trypanosoma brucei. Science. 2005;309:416–422. - PubMed
    1. El-Sayed NM, Myler PJ, Blandin G, Berriman M, Crabtree J, Aggarwal G, Caler E, Renauld H, Worthey EA, Hertz-Fowler C, et al. Comparative genomics of trypanosomatid parasitic protozoa. Science. 2005;309:404–409. - PubMed

Publication types