Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Oct 14:11:567.
doi: 10.1186/1471-2164-11-567.

Systematic identification of conserved motif modules in the human genome

Affiliations

Systematic identification of conserved motif modules in the human genome

Xiaohui Cai et al. BMC Genomics. .

Abstract

Background: The identification of motif modules, groups of multiple motifs frequently occurring in DNA sequences, is one of the most important tasks necessary for annotating the human genome. Current approaches to identifying motif modules are often restricted to searches within promoter regions or rely on multiple genome alignments. However, the promoter regions only account for a limited number of locations where transcription factor binding sites can occur, and multiple genome alignments often cannot align binding sites with their true counterparts because of the short and degenerative nature of these transcription factor binding sites.

Results: To identify motif modules systematically, we developed a computational method for the entire non-coding regions around human genes that does not rely upon the use of multiple genome alignments. First, we selected orthologous DNA blocks approximately 1-kilobase in length based on discontiguous sequence similarity. Next, we scanned the conserved segments in these blocks using known motifs in the TRANSFAC database. Finally, a frequent pattern mining technique was applied to identify motif modules within these blocks. In total, with a false discovery rate cutoff of 0.05, we predicted 3,161,839 motif modules, 90.8% of which are supported by various forms of functional evidence. Compared with experimental data from 14 ChIP-seq experiments, on average, our methods predicted 69.6% of the ChIP-seq peaks with TFBSs of multiple TFs. Our findings also show that many motif modules have distance preference and order preference among the motifs, which further supports the functionality of these predictions.

Conclusions: Our work provides a large-scale prediction of motif modules in mammals, which will facilitate the understanding of gene regulation in a systematic way.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Flow chart describing our method. (a) The basic procedure in our method. (b) The procedure to identify motif modules from conserved blocks.
Figure 2
Figure 2
Contiguously conserved regions and discontiguously conserved regions. Discontiguously conserved regions often contain long divergent sequences, which makes the percent identity of the alignment of the corresponding regions be low.

References

    1. Wasserman WW, Palumbo M, Thompson W, Fickett JW, Lawrence CE. Human-mouse genome comparisons to locate regulatory sites. Nature genetics. 2000;26(2):225–228. doi: 10.1038/79965. - DOI - PubMed
    1. Zhou Q, Wong WH. CisModule: de novo discovery of cis-regulatory modules by hierarchical mixture modeling. Proceedings of the National Academy of Sciences of the United States of America. 2004;101(33):12114–12119. doi: 10.1073/pnas.0402858101. - DOI - PMC - PubMed
    1. Knight JC, Udalova I, Hill AV, Greenwood BM, Peshu N, Marsh K, Kwiatkowski D. A polymorphism that affects OCT-1 binding to the TNF promoter region is associated with severe malaria. Nature genetics. 1999;22(2):145–150. doi: 10.1038/9649. - DOI - PubMed
    1. La Thangue NB. The yin and yang of E2F-1: balancing life and death. Nature cell biology. 2003;5(7):587–589. doi: 10.1038/ncb0703-587. - DOI - PubMed
    1. Galas DJ, Schmitz A. DNAse footprinting: a simple method for the detection of protein-DNA binding specificity. Nucleic acids research. 1978;5(9):3157–3170. doi: 10.1093/nar/5.9.3157. - DOI - PMC - PubMed

Publication types

Substances