Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013;14(10):R117.
doi: 10.1186/gb-2013-14-10-r117.

Sequence signatures extracted from proximal promoters can be used to predict distal enhancers

Sequence signatures extracted from proximal promoters can be used to predict distal enhancers

Leila Taher et al. Genome Biol. 2013.

Abstract

Background: Gene expression is controlled by proximal promoters and distal regulatory elements such as enhancers. While the activity of some promoters can be invariant across tissues, enhancers tend to be highly tissue-specific.

Results: We compiled sets of tissue-specific promoters based on gene expression profiles of 79 human tissues and cell types. Putative transcription factor binding sites within each set of sequences were used to train a support vector machine classifier capable of distinguishing tissue-specific promoters from control sequences. We obtained reliable classifiers for 92% of the tissues, with an area under the receiver operating characteristic curve between 60% (for subthalamic nucleus promoters) and 98% (for heart promoters). We next used these classifiers to identify tissue-specific enhancers, scanning distal non-coding sequences in the loci of the 200 most highly and lowly expressed genes. Thirty percent of reliable classifiers produced consistent enhancer predictions, with significantly higher densities in the loci of the most highly expressed compared to lowly expressed genes. Liver enhancer predictions were assessed in vivo using the hydrodynamic tail vein injection assay. Fifty-eight percent of the predictions yielded significant enhancer activity in the mouse liver, whereas a control set of five sequences was completely negative.

Conclusions: We conclude that promoters of tissue-specific genes often contain unambiguous tissue-specific signatures that can be learned and used for the de novo prediction of enhancers.

PubMed Disclaimer

Figures

Figure 1
Figure 1
DNA motifs in human promoters predict tissue-specific expression. (A) Area under the receiver operating characteristic (ROC) curve for 79 models trained and tested on promoters of genes highly expressed in 79 different tissues. The AUC is an overall summary of diagnostic accuracy. AUC equals 0.5 when the ROC curve corresponds to random chance and 1.0 for perfect accuracy. Reliable models (with median AUC ≥0.6) are displayed in red, while unreliable models (with median AUC ≤0.6) are displayed in gray. Models were evaluated in a five-fold cross-validation setting. (B) Motifs with the greatest predictive power for the liver model. The weights w of the motifs (see Materials and methods) are given in red. Motif weights have been scaled to [-1, 1], where 1 represents the scaled weight of the motif with highest predictive power, and -1 the scaled weight of the motif with the lowest negative predictive power (signs are preserved; see Materials and methods). The names of the features are listed near the baseline of the graph. For comparison, we include weights w for the same motif in the lung, caudate nucleus, thymus models (in different shades of gray). Similarities among the genes that were used to train the models - which reflect functional relatedness among tissues - explain similarities in the predictive power of the motif. Thus, 15% of genes that are highly expressed in liver are also highly expressed in lung, while less than 5% are in caudate nucleus and thymus.
Figure 2
Figure 2
Genome-wide enhancer predictions. (A) The number of enhancer predictions in the loci of highly expressed genes divided by the total number of sequences scanned in the loci of highly expressed genes (in red), as compared to the number of enhancer predictions in the loci of lowly expressed genes divided by the total number of sequences scanned in the loci of lowly expressed genes (in black), for 71 promoter-based models. Statistically significant differences are indicated by asterisks (P-values ≤0.05, Fisher’s exact test). (B) Correlation between the fold enrichment between the proportions of enhancer predictions in the loci of highly expressed and lowly expressed genes with the cross-validation accuracy of the corresponding promoter-based models. (C) Overlap of liver enhancer predictions with strong enhancers predicted by ChromHMM in HepG2 cell lines [51], compared to random sequences with similar length in the loci of genes highly expressed in liver. (D) Overlap of liver enhancer predictions with DNase I hypersensitivity sites (DHS) in HepG2 cell lines from the ENCODE project, compared to random sequences with similar length in the loci of genes highly expressed in liver.
Figure 3
Figure 3
Experimental validation of liver enhancer predictions using the hydrodynamic tail vein enhancer assay. On each injection day, we also injected an empty pGL4.23[luc2] vector and a known liver enhancer of the ApoE gene as negative and positive controls, respectively. At least three mice were injected per construct. Statistical significance was tested using Student’s t-test followed by multiple testing adjustment with Benjamini-Hochberg’s method. The asterisks indicate statistical significance to control at adjusted P-value ≤0.05.

Similar articles

Cited by

References

    1. Clamp M, Fry B, Kamal M, Xie X, Cuff J, Lin MF, Kellis M, Lindblad-Toh K, Lander ES. Distinguishing protein-coding and noncoding genes in the human genome. Proc Natl Acad Sci U S A. 2007;14:19428–19433. doi: 10.1073/pnas.0709013104. - DOI - PMC - PubMed
    1. Smale ST, Kadonaga JT. The RNA polymerase II core promoter. Annu Rev Biochem. 2003;14:449–479. doi: 10.1146/annurev.biochem.72.121801.161520. - DOI - PubMed
    1. Sandelin A, Carninci P, Lenhard B, Ponjavic J, Hayashizaki Y, Hume DA. Mammalian RNA polymerase II core promoters: insights from genome-wide studies. Nat Rev Genet. 2007;14:424–436. - PubMed
    1. Kagey MH, Newman JJ, Bilodeau S, Zhan Y, Orlando DA, van Berkum NL, Ebmeier CC, Goossens J, Rahl PB, Levine SS, Taatjes DJ, Dekker J, Young RA. Mediator and cohesin connect gene expression and chromatin architecture. Nature. 2010;14:430–435. doi: 10.1038/nature09380. - DOI - PMC - PubMed
    1. Dixon JR, Selvaraj S, Yue F, Kim A, Li Y, Shen Y, Hu M, Liu JS, Ren B. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 2012;14:376–380. doi: 10.1038/nature11082. - DOI - PMC - PubMed

Publication types

Substances