Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2000 Dec;10(12):1996-2005.
doi: 10.1101/gr.gr-1463r.

A random sequencing approach for the analysis of the Trypanosoma cruzi genome: general structure, large gene and repetitive DNA families, and gene discovery

Affiliations

A random sequencing approach for the analysis of the Trypanosoma cruzi genome: general structure, large gene and repetitive DNA families, and gene discovery

F Agüero et al. Genome Res. 2000 Dec.

Abstract

A random sequence survey of the genome of Trypanosoma cruzi, the agent of Chagas disease, was performed and 11,459 genomic sequences were obtained, resulting in approximately 4.3 Mb of readable sequences or approximately 10% of the parasite haploid genome. The estimated total GC content was 50.9%, with a high representation of A and T di- and trinucleotide repeats. Out of the estimated 5000 parasite genes, 947 putative new genes were identified. Another 1723 sequences corresponded to genes detected previously in T. cruzi through expression sequence tag analysis. 7735 sequences had no matches in the database, but the presence of open reading frames that passed Fickett's test suggests that some might contain coding DNA. The survey was highly redundant, with approximately 35% of the sequences included in a few large sequence families. Some of them code for protein families present in dozens of copies, including proteins essential for parasite survival and retrotransposons. Other sequence families include repetitive DNA present in thousands of copies per haploid genome. Some families in the latter group are new, parasite-specific, repetitive DNAs. These results suggest that T. cruzi could constitute an interesting model to analyze gene and genome evolution due to its plasticity in terms of sequence amplification and divergence. Additional information can be found at http://www.iib.unsam.edu.ar/tcruzi.gss. html.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Frequency of di- and tri-nucleotide repeats in the Trypanosoma cruzi genome. The total of 11,459 sequences were used to search for the occurrence of all possible words of length 2 (A) and 3 (B) on both strands of the sequences using COMPSEQ. The expected frequency of each word is based on the assumption that all words have the same probability of occurrence. Di- and tri-nucleotide frequencies are expressed as Observed (Obs)/Expected (Exp) − 1, so that negative values correspond to suppressed di- and tri-nucleotides and positive values correspond to di- and tri-nucleotides with frequencies over that expected. Because the search is done on both strands, only one reverse complementary di- or tri-nucleotide of a pair is shown.
Figure 2
Figure 2
Microsatellite repeats and low-complexity regions in the Trypanosoma cruzi genome. Simple repeats and low-complexity regions were searched for in the T. cruzi GSS database using REPEATMASKER as described in the text. (A) The 20 most abundant microsatellite repeats in the survey are shown. The minimum value of n is the one that gives a Smith-Waterman (SW) score ⩾180, which is the cutoff to consider a match as positive. This value varied from 18 for a single nucleotide repeat to 3 for a hexanucleotide repeat. Each named microsatellite in the graph includes all combinations thereof; so (A)n also includes its complement (T)n, and (ATG)n also includes (CAT)n, (ATC)n, (TCA)n, (TGA)n and (GAT)n. (B) Low-complexity regions were searched for as described in the text. The length of the regions detected varied from 16 bp to 308 bp.
Figure 3
Figure 3
Structure of TcIRE. (A) General scheme of the structure of TcIRE. (B) Five GSS containing a copy of TcIRE were aligned using CLUSTALW with the corresponding region of the 25-Kb cosmid sequenced by Gao et al. 1999 (GenBank accession no. AB017765) and the last portion of the 3′ UTR from the Emuce-31l3 mucin gene. The rest of the mucin gene does not show any homology with the sequences aligned and was cut off for the sake of clarity. Coloring is based on BLOSUM 62 scores: 3.0, black; 1.5, gray; 0.5, light gray. Similar residues are colored as the most conserved one. Arrows indicate the two oligonucleotides used to generate the probe for the Southern blot analysis. Arrowheads indicate the site of divergence between the two groups of TcIRE sequences. The top three sequences, including AB017765, are representative of one group of sequences, denoted Group I, whereas the other sequences, including the last portion of the 3′-UTR region from the Emuce-31l3 mucin gene are representative of another group, denoted Group II.
Figure 4
Figure 4
(A) Genomic DNA was prepared as described in Methods, digested with PstI and run in a 0.7% agarose gel at 3 V/cm. and transferred to a nylon membrane. (B) PFGE blots were prepared and processed as described (Henriksson et al. 1995). Chromosomal DNA markers were the CHEF DNA size markers, 0.2–2.2 Mbp (BioRad). Both nylon membranes (A,B) were hybridized with a radioactively labeled 300-bp fragment amplified by PCR using the oligonucleotides TcIRE-fwd and TcIRE-rev as shown in Figure 3 and described in Methods. Parasites, strains, and clones used in A or B are: Leishmania mexicana (Lm); Crithidia fasciculata (Cf); Trypanosoma cruzi strains and clones Tul0, Tul2, Corpus christi (Cc), Y, Perú (P), Sonya (S), CL-Brener (CL), and Sylvio (Sv).

References

    1. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. - PMC - PubMed
    1. Andersson B, Aslund L, Tammi M, Tran A, Hoheisel JD, Pettersson U. Complete sequence of a 93.4-kb contig from chromosome 3 of Trypanosoma cruzi containing a strand-switch region. Genome Res. 1998;8:809–816. - PMC - PubMed
    1. Araya J, Cano MI, Gomes HB, Novak EM, Requena JM, Alonso C, Levin MJ, Guevara P, Ramirez JL, Da Silveira JF. Characterization of an interspersed repetitive DNA element in the genome of Trypanosoma cruzi. Parasitology. 1997;115:563–570. - PubMed
    1. Armah DA, Mensa-Wilmot K. S-myristoylation of a glycosylphosphatidylinositol-specific phospholipase C in Trypanosoma brucei. J Biol Chem. 1999;274:5931–5938. - PubMed
    1. Bateman A, Birney E, Durbin R, Eddy SR, Howe KL, Sonnhammer ELL. The Pfam protein families database. Nucleic Acids Res. 2000;28:263–266. - PMC - PubMed

Publication types

Associated data