Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2010 Jul;20(7):890-8.
doi: 10.1101/gr.100370.109. Epub 2010 May 25.

Sequence features that drive human promoter function and tissue specificity

Affiliations
Comparative Study

Sequence features that drive human promoter function and tissue specificity

Jane M Landolin et al. Genome Res. 2010 Jul.

Abstract

Promoters are important regulatory elements that contain the necessary sequence features for cells to initiate transcription. To functionally characterize a large set of human promoters, we measured the transcriptional activities of 4575 putative promoters across eight cell lines using transient transfection reporter assays. In parallel, we measured gene expression in the same cell lines and observed a significant correlation between promoter activity and endogenous gene expression (r = 0.43). As transient transfection assays directly measure the promoting effect of a defined fragment of DNA sequence, decoupled from epigenetic, chromatin, or long-range regulatory effects, we sought to predict whether a promoter was active using sequence features alone. CG dinucleotide content was highly predictive of ubiquitous promoter activity, necessitating the separation of promoters into two groups: high CG promoters, mostly ubiquitously active, and low CG promoters, mostly cell line-specific. Computational models trained on the binding potential of transcriptional factor (TF) binding motifs could predict promoter activities in both high and low CG groups: average area under the receiver operating characteristic curve (AUC) of the models was 91% and exceeded the AUC of CG content by an average of 23%. Known relationships, for example, between HNF4A and hepatocytes, were recapitulated in the corresponding cell lines, in this case the liver-derived cell line HepG2. Half of the associations between tissue-specific TFs and cell line-specific promoters were new. Our study underscores the importance of collecting functional information from complementary assays and conditions to understand biology in a systematic framework.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Distribution of transient transfection promoter activities and endogenous gene expression scores in eight cell lines. (A) The threshold for active promoters in the transfection assay is set at log2(promoter activity score) = 0, corresponding to the point where promoter activity scores exceed the scores of negative control sequences. (B) The threshold for expressed genes is set at log2(gene expression score) = 7, corresponding to the trough of the bimodal distributions displayed in all eight cell lines.
Figure 2.
Figure 2.
Binary encoding of promoter activity patterns. Active promoters were encoded with the number 1, and inactive promoters were encoded with the number 0.
Figure 3.
Figure 3.
Distribution of normalized CG dinucleotide content among 4575 promoters. The normalized CG dinucleotide content is defined as the ratio of observed over expected number of CG dinucleotides (see Methods). LCG promoters have normalized CG content <0.5 and HCG promoters have normalized CG content > 0.5.
Figure 4.
Figure 4.
Promoter activities of cell line–specific promoters. Cell line–specific promoters are mainly active at medium levels (orange) and rarely active at high (red) levels in the cell line of interest, but the tissue specificities of all promoters are clearly distinguishable by eye. High CpG promoters are indicated by blue bars in the left margin.
Figure 5.
Figure 5.
Elements of transcription regulation. Ubiquitous promoters have high CG content and are regulated by a few TFs. Tissue-specific promoters tend to have low CG content and a TATA box and are regulated by many TFs. Promoter activity of the proximal promoter is primarily determined by sequence content, while endogenous gene expression is additionally influenced by chromatin, DNA methylation, and long-range elements. Note that the molecules in this figure are not drawn to scale.

Similar articles

Cited by

References

    1. Berg OG, von Hippel PH 1987. Selection of DNA binding sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters. J Mol Biol 193: 723–750 - PubMed
    1. Brown CD, Johnson DS, Sidow A 2007. Functional architecture and evolution of transcriptional elements that drive gene coexpression. Science 317: 1557–1560 - PubMed
    1. Brunner AL, Johnson DS, Kim SW, Valouev A, Reddy TE, Neff NF, Anton E, Medina C, Nguyen L, Chiao E, et al. 2009. Distinct DNA methylation patterns characterize differentiated human embryonic stem cells and developing human fetal liver. Genome Res 19: 1044–1056 - PMC - PubMed
    1. Carninci P, Kasukawa T, Katayama S, Gough J, Frith MC, Maeda N, Oyama R, Ravasi T, Lenhard B, Wells C, et al. 2005. The transcriptional landscape of the mammalian genome. Science 309: 1559–1563 - PubMed
    1. Cooper SJ, Trinklein ND, Anton ED, Nguyen L, Myers RM 2006. Comprehensive analysis of transcriptional promoter structure and function in 1% of the human genome. Genome Res 16: 1–10 - PMC - PubMed

Substances

Associated data