Identification and characterization of multi-species conserved sequences
- PMID: 14656959
- PMCID: PMC403793
- DOI: 10.1101/gr.1602203
Identification and characterization of multi-species conserved sequences
Abstract
Comparative sequence analysis has become an essential component of studies aiming to elucidate genome function. The increasing availability of genomic sequences from multiple vertebrates is creating the need for computational methods that can detect highly conserved regions in a robust fashion. Towards that end, we are developing approaches for identifying sequences that are conserved across multiple species; we call these "Multi-species Conserved Sequences" (or MCSs). Here we report two strategies for MCS identification, demonstrating their ability to detect virtually all known actively conserved sequences (specifically, coding sequences) but very little neutrally evolving sequence (specifically, ancestral repeats). Importantly, we find that a substantial fraction of the bases within MCSs (approximately 70%) resides within non-coding regions; thus, the majority of sequences conserved across multiple vertebrate species has no known function. Initial characterization of these MCSs has revealed sequences that correspond to clusters of transcription factor-binding sites, non-coding RNA transcripts, and other candidate functional elements. Finally, the ability to detect MCSs represents a valuable metric for assessing the relative contribution of a species' sequence to identifying genomic regions of interest, and our results indicate that the currently available genome sequences are insufficient for the comprehensive identification of MCSs in the human genome.
Figures








Similar articles
-
Noncoding sequences conserved in a limited number of mammals in the SIM2 interval are frequently functional.Genome Res. 2004 Mar;14(3):367-72. doi: 10.1101/gr.1961204. Epub 2004 Feb 12. Genome Res. 2004. PMID: 14962988 Free PMC article.
-
Detection of potential GDF6 regulatory elements by multispecies sequence comparisons and identification of a skeletal joint enhancer.Genomics. 2005 Sep;86(3):295-305. doi: 10.1016/j.ygeno.2005.05.003. Genomics. 2005. PMID: 15979840
-
Identification of functional transcription factor binding sites using closely related Saccharomyces species.Genome Res. 2005 May;15(5):701-9. doi: 10.1101/gr.3578205. Epub 2005 Apr 18. Genome Res. 2005. PMID: 15837806 Free PMC article.
-
Bioinformatics for the 'bench biologist': how to find regulatory regions in genomic DNA.Nat Immunol. 2004 Aug;5(8):768-74. doi: 10.1038/ni0804-768. Nat Immunol. 2004. PMID: 15282556 Review.
-
Computational prediction of transcription-factor binding site locations.Genome Biol. 2003;5(1):201. doi: 10.1186/gb-2003-5-1-201. Epub 2003 Dec 23. Genome Biol. 2003. PMID: 14709165 Free PMC article. Review.
Cited by
-
Hippo-Yap Signaling Maintains Sinoatrial Node Homeostasis.Circulation. 2022 Nov 29;146(22):1694-1711. doi: 10.1161/CIRCULATIONAHA.121.058777. Epub 2022 Nov 1. Circulation. 2022. PMID: 36317529 Free PMC article.
-
Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome.Genome Res. 2007 Jun;17(6):760-74. doi: 10.1101/gr.6034307. Genome Res. 2007. PMID: 17567995 Free PMC article.
-
Addition of the microchromosome GGA25 to the chicken genome sequence assembly through radiation hybrid and genetic mapping.BMC Genomics. 2008 Mar 17;9:129. doi: 10.1186/1471-2164-9-129. BMC Genomics. 2008. PMID: 18366813 Free PMC article.
-
Human IRES Atlas: an integrative platform for studying IRES-driven translational regulation in humans.Database (Oxford). 2021 May 3;2021:baab025. doi: 10.1093/database/baab025. Database (Oxford). 2021. PMID: 33942874 Free PMC article.
-
In silico and functional studies of the regulation of the glucocerebrosidase gene.Mol Genet Metab. 2010 Mar;99(3):275-82. doi: 10.1016/j.ymgme.2009.10.189. Epub 2009 Nov 4. Mol Genet Metab. 2010. PMID: 20004604 Free PMC article.
References
-
- Akker, S.A., Smith, P.J., and Chew, S.L. 2001. Nuclear post-transcriptional control of gene expression. J. Mol. Endocrinol. 27: 123-131. - PubMed
-
- Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215: 403-410. - PubMed
-
- Aparicio, S., Chapman, J., Stupka, E., Putnam, N., Chia, J.M., Dehal, P., Christoffels, A., Rash, S., Hoon, S., Smit, A., et al. 2002. Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science 297: 1301-1310. - PubMed
-
- Bailey, L. and Elkan, C. 1995. Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Machine Learning 21: 51-80.
WEB SITE REFERENCES
-
- http://www.nisc.nih.gov; NIH Intramural Sequencing Center (NISC) home page.
-
- http://www.nisc.nih.gov/data; Supplementary data, including annotated sequence for the studies reported here and supplemental tables.
-
- http://genome.ucsc.edu; UC Santa Cruz Genome Browser home page, including the multi-species “zoo browser.”
-
- http://bio.cs.washington.edu; Computational Molecular Biology Group (University of Washington, Computer Science & Engineering) home page.
-
- http://genome.gov/ENCODE; ENCODE project home page.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Miscellaneous