Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Nov 9:8:437.
doi: 10.1186/1471-2105-8-437.

Identification of tissue-specific cis-regulatory modules based on interactions between transcription factors

Affiliations

Identification of tissue-specific cis-regulatory modules based on interactions between transcription factors

Xueping Yu et al. BMC Bioinformatics. .

Abstract

Background: Evolutionary conservation has been used successfully to help identify cis-acting DNA regions that are important in regulating tissue-specific gene expression. Motivated by increasing evidence that some DNA regulatory regions are not evolutionary conserved, we have developed an approach for cis-regulatory region identification that does not rely upon evolutionary sequence conservation.

Results: The conservation-independent approach is based on an empirical potential energy between interacting transcription factors (TFs). In this analysis, the potential energy is defined as a function of the number of TF interactions in a genomic region and the strength of the interactions. By identifying sets of interacting TFs, the analysis locates regions enriched with the binding sites of these interacting TFs. We applied this approach to 30 human tissues and identified 6232 putative cis-regulatory modules (CRMs) regulating 2130 tissue-specific genes. Interestingly, some genes appear to be regulated by different CRMs in different tissues. Known regulatory regions are highly enriched in our predicted CRMs. In addition, DNase I hypersensitive sites, which tend to be associated with active regulatory regions, significantly overlap with the predicted CRMs, but not with more conserved regions. We also find that conserved and non-conserved CRMs regulate distinct gene groups. Conserved CRMs control more essential genes and genes involved in fundamental cellular activities such as transcription. In contrast, non-conserved CRMs, in general, regulate more non-essential genes, such as genes related to neural activity.

Conclusion: These results demonstrate that identifying relevant sets of binding motifs can help in the mapping of DNA regulatory regions, and suggest that non-conserved CRMs play an important role in gene regulation.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Schematic of module detection method based on TF interactions. Based on gene expression profiles across different tissues, we identified groups of genes that are preferentially expressed in tissues (e.g. gene C and D in the schematic). For each group of genes, we searched the binding sites of known TFs in promoter regions and determined the TF pairs whose binding sites tend to co-occur in close proximity. A tissue-specific TF interaction network was obtained from the analysis. We then scanned the genomic regions and identified cis-regulatory regions (CRMs). The CRMs are defined as regions enriched with TF interactions. Note the first steps were implemented in our previous work [22] while this paper focuses on the last step.
Figure 2
Figure 2
Two examples of predicted CRMs. (A) upstream 5 k to translational start site for gene ALDOA. (B) same for gene CNGB3. Upper panels are the "potential energy" based on TF interactions. Middle panels show the density of all known TFBSs (total 306 TFBSs) in a sliding window along the region. Bottom panels depict the conservation scores of the regions. The dashed lines are the thresholds used in our prediction. The positions with lower energy than the threshold are predicted as CRMs (indicated by vertical bars). The red dots in (A) indicate the positions of known regulatory sites.
Figure 3
Figure 3
Enrichment and sensitivity of predictions. We evaluated the performance of predictions using sensitivity and enrichment. Two types of predictions were compared: one is the TF interaction based method and the other is the solely conservation based method. (A) Using known regulatory elements as positive controls. (B) Using DNase I hypersensitive sites as positive controls.
Figure 4
Figure 4
Dependence of regulatory activity on positions relative to gene structure. We calculated the probability for each position containing a CRM. The reference positions (origins in the x-axis) are transcription start sites, the respective start sites of introns and transcription end sites in three regions, respectively. The pink curve in the left panel is from random sequences which were generated with the same nucleic acids compositions and 1st order transition probabilities, respectively, as those of the all promoter sequences in the human genome.
Figure 5
Figure 5
The energy landscapes for PITX2. The landscape in the upper panel was calculated based on placenta-specific interactions between TFs. The one in bottom panel was based on eye-specific TF interactions.

Similar articles

Cited by

References

    1. Berman BP, Nibu Y, Pfeiffer BD, Tomancak P, Celniker SE, Levine M, Rubin GM, Eisen MB. Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome. Proc Natl Acad Sci U S A. 2002;99:757–762. doi: 10.1073/pnas.231608898. - DOI - PMC - PubMed
    1. Berman BP, Pfeiffer BD, Laverty TR, Salzberg SL, Rubin GM, Eisen MB, Celniker SE. Computational identification of developmental enhancers: conservation and function of transcription factor binding-site clusters in Drosophila melanogaster and Drosophila pseudoobscura. Genome Biol. 2004;5:R61. doi: 10.1186/gb-2004-5-9-r61. - DOI - PMC - PubMed
    1. Frith MC, Spouge JL, Hansen U, Weng Z. Statistical significance of clusters of motifs represented by position specific scoring matrices in nucleotide sequences. Nucleic Acids Res. 2002;30:3214–3224. doi: 10.1093/nar/gkf438. - DOI - PMC - PubMed
    1. Halfon MS, Grad Y, Church GM, Michelson AM. Computation-based discovery of related transcriptional regulatory modules and motifs using an experimentally validated combinatorial model. Genome Res. 2002;12:1019–1028. - PMC - PubMed
    1. Krivan W, Wasserman WW. A predictive model for regulatory sequences directing liver-specific transcription. Genome Res. 2001;11:1559–1566. doi: 10.1101/gr.180601. - DOI - PMC - PubMed

Publication types

Substances

LinkOut - more resources