Statistical approaches to use a model organism for regulatory sequences annotation of newly sequenced species
- PMID: 22984403
- PMCID: PMC3439465
- DOI: 10.1371/journal.pone.0042489
Statistical approaches to use a model organism for regulatory sequences annotation of newly sequenced species
Abstract
A major goal of bioinformatics is the characterization of transcription factors and the transcriptional programs they regulate. Given the speed of genome sequencing, we would like to quickly annotate regulatory sequences in newly-sequenced genomes. In such cases, it would be helpful to predict sequence motifs by using experimental data from closely related model organism. Here we present a general algorithm that allow to identify transcription factor binding sites in one newly sequenced species by performing Bayesian regression on the annotated species. First we set the rationale of our method by applying it within the same species, then we extend it to use data available in closely related species. Finally, we generalise the method to handle the case when a certain number of experiments, from several species close to the species on which to make inference, are available. In order to show the performance of the method, we analyse three functionally related networks in the Ascomycota. Two gene network case studies are related to the G2/M phase of the Ascomycota cell cycle; the third is related to morphogenesis. We also compared the method with MatrixReduce and discuss other types of validation and tests. The first network is well known and provides a biological validation test of the method. The two cell cycle case studies, where the gene network size is conserved, demonstrate an effective utility in annotating new species sequences using all the available replicas from model species. The third case, where the gene network size varies among species, shows that the combination of information is less powerful but is still informative. Our methodology is quite general and could be extended to integrate other high-throughput data from model organisms.
Conflict of interest statement
Figures


Similar articles
-
A functional selection model explains evolutionary robustness despite plasticity in regulatory networks.Mol Syst Biol. 2012;8:619. doi: 10.1038/msb.2012.50. Mol Syst Biol. 2012. PMID: 23089682 Free PMC article.
-
A map of the cis-regulatory sequences in the mouse genome.Nature. 2012 Aug 2;488(7409):116-20. doi: 10.1038/nature11243. Nature. 2012. PMID: 22763441 Free PMC article.
-
Inference of transcriptional networks in Arabidopsis through conserved noncoding sequence analysis.Plant Cell. 2014 Jul;26(7):2729-45. doi: 10.1105/tpc.114.127001. Epub 2014 Jul 2. Plant Cell. 2014. PMID: 24989046 Free PMC article.
-
From elements to modules: regulatory evolution in Ascomycota fungi.Curr Opin Genet Dev. 2009 Dec;19(6):571-8. doi: 10.1016/j.gde.2009.09.007. Epub 2009 Oct 29. Curr Opin Genet Dev. 2009. PMID: 19879128 Free PMC article. Review.
-
Comparative Genome Annotation.Methods Mol Biol. 2018;1704:189-212. doi: 10.1007/978-1-4939-7463-4_6. Methods Mol Biol. 2018. PMID: 29277866 Review.
Cited by
-
Understanding gene regulatory mechanisms by integrating ChIP-seq and RNA-seq data: statistical solutions to biological problems.Front Cell Dev Biol. 2014 Sep 17;2:51. doi: 10.3389/fcell.2014.00051. eCollection 2014. Front Cell Dev Biol. 2014. PMID: 25364758 Free PMC article.
References
-
- Tompa M, Li N, Bailey TL, Church GM, De Moor B, et al. (2005) Assessing computational tools for the discovery of transcription factor binding sites. Nat Biotechnol 23: 137–144. - PubMed
-
- Brown CT (2008) Computational approaches to finding and analyzing cis-regulatory elements. Methods Cell Biol 87: 337–65. - PubMed
-
- Johnson DS, Mortazavi A, Myers RM, Wold B (2007) Genome-wide mapping of in vivo protein-DNA interactions. Science 316: 1497–1502. - PubMed
-
- Weirauch MT, Hughes TR (2010) Conserved expression without conserved regulatory sequence: the more things change, the more they stay the same. Trends Genet 26: 66–74. - PubMed
MeSH terms
LinkOut - more resources
Full Text Sources
Medical