Database of Trypanosoma cruzi repeated genes: 20,000 additional gene variants
- PMID: 17963481
- PMCID: PMC2204015
- DOI: 10.1186/1471-2164-8-391
Database of Trypanosoma cruzi repeated genes: 20,000 additional gene variants
Abstract
Background: Repeats are present in all genomes, and often have important functions. However, in large genome sequencing projects, many repetitive regions remain uncharacterized. The genome of the protozoan parasite Trypanosoma cruzi consists of more than 50% repeats. These repeats include surface molecule genes, and several other gene families. In the T. cruzi genome sequencing project, it was clear that not all copies of repetitive genes were present in the assembly, due to collapse of nearly identical repeats. However, at the time of publication of the T. cruzi genome, it was not clear to what extent this had occurred.
Results: We have developed a pipeline to estimate the genomic repeat content, where shotgun reads are aligned to the genomic sequence and the gene copy number is estimated using the average shotgun coverage. This method was applied to the genome of T. cruzi and copy numbers of all protein coding sequences and pseudogenes were estimated. The 22,640 results were stored in a database available online. 18% of all protein coding sequences and pseudogenes were estimated to exist in 14 or more copies in the T. cruzi CL Brener genome. The average coverage of the annotated protein coding sequences and pseudogenes indicate a total gene copy number, including allelic gene variants, of over 40,000.
Conclusion: Our results indicate that the number of protein coding sequences and pseudogenes in the T. cruzi genome may be twice the previous estimate. We have constructed a database of the T. cruzi gene repeat data that is available as a resource to the community. The main purpose of the database is to enable biologists interested in repeated, unfinished regions to closely examine and resolve these regions themselves using all available shotgun data, instead of having to rely on annotated consensus sequences that often are erroneous and possibly misleading. Five repetitive genes were studied in more detail, in order to illustrate how the database can be used to analyze and extract information about gene repeats with different characteristics in Trypanosoma cruzi.
Figures





Similar articles
-
Accessing the Variability of Multicopy Genes in Complex Genomes using Unassembled Next-Generation Sequencing Reads: The Case of Trypanosoma cruzi Multigene Families.mBio. 2022 Dec 20;13(6):e0231922. doi: 10.1128/mbio.02319-22. Epub 2022 Oct 20. mBio. 2022. PMID: 36264102 Free PMC article.
-
A random sequencing approach for the analysis of the Trypanosoma cruzi genome: general structure, large gene and repetitive DNA families, and gene discovery.Genome Res. 2000 Dec;10(12):1996-2005. doi: 10.1101/gr.gr-1463r. Genome Res. 2000. PMID: 11116094 Free PMC article.
-
The Trypanosoma cruzi genome; conserved core genes and extremely variable surface molecule families.Res Microbiol. 2011 Jul-Aug;162(6):619-25. doi: 10.1016/j.resmic.2011.05.003. Epub 2011 May 18. Res Microbiol. 2011. PMID: 21624458
-
An Evolutionary View of Trypanosoma Cruzi Telomeres.Front Cell Infect Microbiol. 2020 Jan 10;9:439. doi: 10.3389/fcimb.2019.00439. eCollection 2019. Front Cell Infect Microbiol. 2020. PMID: 31998659 Free PMC article. Review.
-
Repetitive elements in genomes of parasitic protozoa.Microbiol Mol Biol Rev. 2003 Sep;67(3):360-75, table of contents. doi: 10.1128/MMBR.67.3.360-375.2003. Microbiol Mol Biol Rev. 2003. PMID: 12966140 Free PMC article. Review.
Cited by
-
A genome-wide analysis of genetic diversity in Trypanosoma cruzi intergenic regions.PLoS Negl Trop Dis. 2014 May 1;8(5):e2839. doi: 10.1371/journal.pntd.0002839. eCollection 2014 May. PLoS Negl Trop Dis. 2014. PMID: 24784238 Free PMC article.
-
Accessing the Variability of Multicopy Genes in Complex Genomes using Unassembled Next-Generation Sequencing Reads: The Case of Trypanosoma cruzi Multigene Families.mBio. 2022 Dec 20;13(6):e0231922. doi: 10.1128/mbio.02319-22. Epub 2022 Oct 20. mBio. 2022. PMID: 36264102 Free PMC article.
-
Protein subcellular relocalization and function of duplicated flagellar calcium binding protein genes in honey bee trypanosomatid parasite.PLoS Genet. 2024 Mar 4;20(3):e1011195. doi: 10.1371/journal.pgen.1011195. eCollection 2024 Mar. PLoS Genet. 2024. PMID: 38437202 Free PMC article.
-
Pathogenesis of chagas' disease: parasite persistence and autoimmunity.Clin Microbiol Rev. 2011 Jul;24(3):592-630. doi: 10.1128/CMR.00063-10. Clin Microbiol Rev. 2011. PMID: 21734249 Free PMC article. Review.
-
Kinetoplastid genomics: the thin end of the wedge.Infect Genet Evol. 2008 Dec;8(6):901-6. doi: 10.1016/j.meegid.2008.07.001. Epub 2008 Jul 15. Infect Genet Evol. 2008. PMID: 18675383 Free PMC article.
References
-
- Bussey KJ, Chin K, Lababidi S, Reimers M, Reinhold WC, Kuo WL, Gwadry F, Kouros-Mehr H, Fridlyand J, Jain A, Collins C, Nishizuka S, Tonon G, Roschke A, Gehlhaus K, Kirsch I, Scudiero DA, Gray JW, Weinstein JN. Integrating data on DNA copy number with gene expression levels and drug sensitivities in the NCI-60 cell line panel. Mol Cancer Ther. 2006;5:853–867. doi: 10.1158/1535-7163.MCT-05-0155. - DOI - PMC - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources