Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Jun 27:12:262.
doi: 10.1186/1471-2105-12-262.

A mutation degree model for the identification of transcriptional regulatory elements

Affiliations

A mutation degree model for the identification of transcriptional regulatory elements

Changqing Zhang et al. BMC Bioinformatics. .

Abstract

Background: Current approaches for identifying transcriptional regulatory elements are mainly via the combination of two properties, the evolutionary conservation and the overrepresentation of functional elements in the promoters of co-regulated genes. Despite the development of many motif detection algorithms, the discovery of conserved motifs in a wide range of phylogenetically related promoters is still a challenge, especially for the short motifs embedded in distantly related gene promoters or very closely related promoters, or in the situation that there are not enough orthologous genes available.

Results: A mutation degree model is proposed and a new word counting method is developed for the identification of transcriptional regulatory elements from a set of co-expressed genes. The new method comprises two parts: 1) identifying overrepresented oligo-nucleotides in promoters of co-expressed genes, 2) estimating the conservation of the oligo-nucleotides in promoters of phylogenetically related genes by the mutation degree model. Compared with the performance of other algorithms, our method shows the advantages of low false positive rate and higher specificity, especially the robustness to noisy data. Applying the method to co-expressed gene sets from Arabidopsis, most of known cis-elements were successfully detected. The tool and example are available at http://mcube.nju.edu.cn/jwang/lab/soft/ocw/OCW.html.

Conclusions: The mutation degree model proposed in this paper is adapted to phylogenetic data of different qualities, and to a wide range of evolutionary distances. The new word-counting method based on this model has the advantage of better performance in detecting short sequence of cis-elements from co-expressed genes of eukaryotes and is robust to less complete phylogenetic data.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Performance comparisons of different tools on simulated data. The predictions shown in histogram are from AlignACE, GLAM2 and Weeder. These three tools are based on over-represented word detection. The predictions shown in line chart are from OCW, PhyloGibbs, PhyloCon, and WeederH, which introduced phylogenetic information in the algorithms. The extent of convergence of artificial orthologous sequences used in these tools is represented by the sequence identity.
Figure 2
Figure 2
Performance of OCW, PhyloCon, PhyloGibbs and WeederH on noisy data. The extent of noise was adjusted by introducing an increasing number (k) of random promoters into the phylogenetic sets.
Figure 3
Figure 3
Illustrations for mutation degree model and OCW method. (A) Illustration of the mutation degree model. The phylogenetic promoter sequences of Gene#1, Gene#2 and Gene#3 etc. are highlighted in light blue. Mutation degrees between the promoter of species1 and its phylogenetic related promoters are denoted as a1%, b1%, etc. The data in the result column is only for demonstration. The co-expressed gene set highlighted in lavender belongs to Species1. (B) Flow chart of OCW. Step 1: All oligo-nucleotides presented in co-expressed genes are enumerated; Step 2: Fisher's exact test of the over-representation significance of the enumerated oligo-nucleotides; Step 3: Calculation of the conservation score of the elements resulted from step 2, the elements with S>1 are reported; Step 4: Reporting functional elements that meet the criteria assigned by user.

Similar articles

Cited by

References

    1. Wasserman WW, Sandelin A. Applied bioinformatics for the identification of regulatory elements. Nat Rev Genet. 2004;5(4):276–287. doi: 10.1038/nrg1315. - DOI - PubMed
    1. Raab JR, Kamakaka RT. Insulators and promoters: closer than we think. Nat Rev Genet. 2010;11(6):439–446. doi: 10.1038/nrg2765. - DOI - PMC - PubMed
    1. Priest HD, Filichkin SA, Mockler TC. Cis-regulatory elements in plant cell signaling. Curr Opin Plant Biol. 2009;12(5):643–649. doi: 10.1016/j.pbi.2009.07.016. - DOI - PubMed
    1. Shah N, Couronne O, Pennacchio LA, Brudno M, Batzoglou S, Bethel EW, Rubin EM, Hamann B, Dubchak I. Phylo-VISTA: interactive visualization of multiple DNA sequence alignments. Bioinformatics. 2004;20(5):636–643. doi: 10.1093/bioinformatics/btg459. - DOI - PubMed
    1. Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S. et al.Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005;15(8):1034–1050. doi: 10.1101/gr.3715005. - DOI - PMC - PubMed

Publication types

LinkOut - more resources