Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Aug;18(8):1325-35.
doi: 10.1101/gr.072769.107. Epub 2008 May 15.

Cross-species de novo identification of cis-regulatory modules with GibbsModule: application to gene regulation in embryonic stem cells

Affiliations

Cross-species de novo identification of cis-regulatory modules with GibbsModule: application to gene regulation in embryonic stem cells

Dan Xie et al. Genome Res. 2008 Aug.

Abstract

We introduce the GibbsModule algorithm for de novo detection of cis-regulatory motifs and modules in eukaryote genomes. GibbsModule models the coexpressed genes within one species as sharing a core cis-regulatory motif and each homologous gene group as sharing a homologous cis-regulatory module (CRM), characterized by a similar composition of motifs. Without using a predetermined alignment result, GibbsModule iteratively updates the core motif shared by coexpressed genes and traces the homologous CRMs that contain the core motif. GibbsModule achieved substantial improvements in both precision and recall as compared with peer algorithms on a number of synthetic and real data sets. Applying GibbsModule to analyze the binding regions of the Krüppel-like factor (KLF) transcription factor in embryonic stem cells (ESCs), we discovered a motif that differs from a previously published KLF motif identified by a SELEX experiment, but the new motif is consistent with mutagenesis analysis. The SOX2 motif was found to be a collaborating motif to the KLF motif in ESCs. We used quantitative chromatin immunoprecipitation (ChIP) analysis to test whether GibbsModule could distinguish functional and nonfunctional binding sites. All seven tested binding sites in GibbsModule-predicted CRMs had higher ChIP signals as compared with the other seven tested binding sites located outside of predicted CRMs. GibbsModule is available at (http://biocomp.bioen.uiuc.edu/GibbsModule).

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Three information sources for de novo identification of cis-regulatory motifs. The three circles represent three information sources that can be utilized for motif and CRM finding. A tool enclosed in a circle indicates that this tool utilizes that information.
Figure 2.
Figure 2.
Motifs and CRMs in coexpressed genes and their homologous genes. A, B, C, and D represent coexpressed genes in one species. A′ and A′′ represent the homologous genes to A in two other species, and so on. (X, O, #) TFBSs for different transcription factors.
Figure 3.
Figure 3.
GibbsModule workflow. In Step 1, a random PSWM is initialized. Steps 2-5 are the iterative steps. In Step 2, N candidate binding sites are sampled from every homologous sequence using the same PSWM. In this example, three candidate-binding sites are sampled on each sequence (N = 3). Every sampled binding site defines a candidate CRM, which includes the binding site itself and 100 bp of flanking region on each side. These candidate CRMs are marked 1, 2, 3 on the target sequence, and 1′, 2′, 3′ and 1″, 2″, 3″ on the sequences of two assisting species. In Step 3, Module-Alignment is applied to every candidate CRM on the target sequence and every CRM on the assisting sequences. In the example, the alignments are applied to CRM pairs of (1, 1′), (1, 2′), (1, 3′), (2, 1′), (2, 2′), . . . , (3, 1″), (3, 2″), and (3, 3″). In Steps 3 and 4, a most conserved CRM on the target sequence is picked up by arg formula image(formula image(score(n,n')) + formula image(score(n,n′))), where n, n′, and n″ are indicators of candidate CRMs in homologous sequences SeqA, orthA1, and orthA2, respectively. (X, O, #) Other motifs close to the core motif within a CRM. In Step 5, a new PSWM is calculated from the core motifs in the most conserved CRMs.
Figure 4.
Figure 4.
Module-Alignment. (A) An illustration of three pairs of orthologous CRMs: All three CRM pairs consist of TFBSs generated from the same motifs (squares, ellipses, and triangles). (1) Orthologous CRMs with conserved number, order, and distances. (24) Orthologous CRMs with different distances, order, and number of TFBSs. (B) Workflow of Module-Alignment: Module-Alignment iteratively performs local alignment and masks out conserved regions. The mutations and gaps between the alignable segments on the upper and lower sequences incur severe penalty so that Smith-Waterman can detect only the best local alignment in the first row. The conservation score from local alignment is the score of the best local alignment. However, the conservation score from Module-Alignment is the sum of the two local alignments scores from the two alignable sequence segments. Module-alignment is not designed to align any two sequences with any arbitrary lengths. Its input sequences should be potentially orthologous CRMs with lengths of several dozen to several hundred base pairs.
Figure 5.
Figure 5.
Motifs derived from Klf ChIP-chip and mutagenesis analysis. SOX2 motif (A) and KLF motif (B) found by GibbsModule and MEME from KLF ChIP-chip data. (C) SOX2 motif in TRANSFAC.
Figure 6.
Figure 6.
ChIP signals of predicted CRM and non-CRMs that contain putative POU5F1-binding sites.

References

    1. Abeyta M.J., Clark A.T., Rodriguez R.T., Bodnar M.S., Pera R.A., Firpo M.T., Clark A.T., Rodriguez R.T., Bodnar M.S., Pera R.A., Firpo M.T., Rodriguez R.T., Bodnar M.S., Pera R.A., Firpo M.T., Bodnar M.S., Pera R.A., Firpo M.T., Pera R.A., Firpo M.T., Firpo M.T. Unique gene expression signatures of independently derived human embryonic stem cell lines. Hum. Mol. Genet. 2004;13:601–608. - PubMed
    1. Bailey T.L., Elkan C., Elkan C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol. 1994;2:28–36. - PubMed
    1. Banerji J., Rusconi S., Schaffner W., Rusconi S., Schaffner W., Schaffner W. Expression of a beta-globin gene is enhanced by remote SV40 DNA sequences. Cell. 1981;27:299–308. - PubMed
    1. Bejerano G., Pheasant M., Makunin I., Stephen S., Kent W.J., Mattick J.S., Haussler D., Pheasant M., Makunin I., Stephen S., Kent W.J., Mattick J.S., Haussler D., Makunin I., Stephen S., Kent W.J., Mattick J.S., Haussler D., Stephen S., Kent W.J., Mattick J.S., Haussler D., Kent W.J., Mattick J.S., Haussler D., Mattick J.S., Haussler D., Haussler D. Ultraconserved elements in the human genome. Science. 2004;304:1321–1325. - PubMed
    1. Bernstein B.E., Mikkelsen T.S., Xie X., Kamal M., Huebert D.J., Cuff J., Fry B., Meissner A., Wernig M., Plath K., Mikkelsen T.S., Xie X., Kamal M., Huebert D.J., Cuff J., Fry B., Meissner A., Wernig M., Plath K., Xie X., Kamal M., Huebert D.J., Cuff J., Fry B., Meissner A., Wernig M., Plath K., Kamal M., Huebert D.J., Cuff J., Fry B., Meissner A., Wernig M., Plath K., Huebert D.J., Cuff J., Fry B., Meissner A., Wernig M., Plath K., Cuff J., Fry B., Meissner A., Wernig M., Plath K., Fry B., Meissner A., Wernig M., Plath K., Meissner A., Wernig M., Plath K., Wernig M., Plath K., Plath K., et al. A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell. 2006;125:315–326. - PubMed

Publication types

LinkOut - more resources