Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Jul 4:7:330.
doi: 10.1186/1471-2105-7-330.

Genome-wide prediction of transcriptional regulatory elements of human promoters using gene expression and promoter analysis data

Affiliations

Genome-wide prediction of transcriptional regulatory elements of human promoters using gene expression and promoter analysis data

Seon-Young Kim et al. BMC Bioinformatics. .

Abstract

Background: A complete understanding of the regulatory mechanisms of gene expression is the next important issue of genomics. Many bioinformaticians have developed methods and algorithms for predicting transcriptional regulatory mechanisms from sequence, gene expression, and binding data. However, most of these studies involved the use of yeast which has much simpler regulatory networks than human and has many genome wide binding data and gene expression data under diverse conditions. Studies of genome wide transcriptional networks of human genomes currently lag behind those of yeast.

Results: We report herein a new method that combines gene expression data analysis with promoter analysis to infer transcriptional regulatory elements of human genes. The Z scores from the application of gene set analysis with gene sets of transcription factor binding sites (TFBSs) were successfully used to represent the activity of TFBSs in a given microarray data set. A significant correlation between the Z scores of gene sets of TFBSs and individual genes across multiple conditions permitted successful identification of many known human transcriptional regulatory elements of genes as well as the prediction of numerous putative TFBSs of many genes which will constitute a good starting point for further experiments. Using Z scores of gene sets of TFBSs produced better predictions than the use of mRNA levels of a transcription factor itself, suggesting that the Z scores of gene sets of TFBSs better represent diverse mechanisms for changing the activity of transcription factors in the cell. In addition, cis-regulatory modules, combinations of co-acting TFBSs, were readily identified by our analysis.

Conclusion: By a strategic combination of gene set level analysis of gene expression data sets and promoter analysis, we were able to identify and predict many transcriptional regulatory elements of human genes. We conclude that this approach will aid in decoding some of the important transcriptional regulatory elements of human genes.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Schematic diagram of the procedures used in this study. Gene expression data sets were retrieved from GEO. Human promoters with experimentally verified transcription start sites were retrieved from the DBTSS database and analyzed for TFBSs with the TransFac version 3.0 database using the MatInspector program. For each microarray data set, the fold-change values between two experimental groups were calculated and used in gene set analyses with gene sets of TFBSs. This procedure was repeated over multiple data sets, resulting in a matrix of fold-change values over multiple data sets and a matrix of Z-scores over multiple data sets. The correlations between fold-change values and Z-scores over multiple data sets were calculated. Finally, statistically significant TFBSs were identified for each gene.
Figure 2
Figure 2
Patterns of correlation between the fold-change values and Z-scores of TFBSs over multiple data sets for IL8 and PCNA. a. Correlation between the fold-change values for IL8 and Z-scores of V$NFKAPPAB65_01 over 127 microarray data sets. b. Correlation between the fold-change values for PCNA and Z-scores of V$E2F_01. c. Correlation between the fold-change values for IL8 and Z-scores of V$E2F_01. d. Correlation between the fold-change values for IL8 and Z-scores of V$E2F_01. The Pearson's correlation coefficient was used to calculate the degree of correlation between the two arrays and the t-test was used to evaluate the significance of the correlation (see Methods).
Figure 3
Figure 3
Selection of optimal matrix similarity cut-off value for each TFBS. Two TFBSs (V$NFKAPPAB65_01 and V$ISRE_01) are shown as examples. When predicting putative TFBSs from promoter sequences using the MatInspector program, the core similarity cut-off value was set as 0.75, and the overall similarity cut-off value was varied from 0.7 to 1.0 by increments of 0.02.
Figure 4
Figure 4
Distribution of correlation coefficients between two independently prepared predictions of TFBSs. A. Distribution of correlation coefficients of 8738 genes between 190 TFBSs predictions from U95A and 190 TFBSs predictions from U133A data sets. B. Distribution of correlation coefficients of 8738 genes between two groups each of which was randomly selected from a standard normal distribution. C. Distribution of correlation coefficients of 190 TFBSs between 8738 predictions from U95A and 8738 predictions from U133A data sets. D. Distribution of correlation coefficients of 190 TFBSs between two groups each randomly selected from standard normal distribution.
Figure 5
Figure 5
Identification of NFκB-regulated genes by selecting significant correlations between the fold-change values of the genes and Z-scores of V$NFKAPPAB65_01 among multiple data sets. Correlation coefficients were converted into t-scores. Java Treeview was used to represent visually the matrix of t-scores over all TFBSs and genes. Genes that correlated highly with V$NFKAPPAB65_01 are shown. Several TFBSs showing high correlation with genes regulated by V$NFKAPPAB65_01 are marked.

Similar articles

Cited by

References

    1. Consortium EP. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science. 2004;306:636–640. doi: 10.1126/science.1105136. - DOI - PubMed
    1. Lee TI, Rinaldi NJ, Robert F, Odom DT, Bar-Joseph Z, Gerber GK, Hannett NM, Harbison CT, Thompson CM, Simon I, Zeitlinger J, Jennings EG, Murray HL, Gordon DB, Ren B, Wyrick JJ, Tagne JB, Volkert TL, Fraenkel E, Gifford DK, Young RA. Transcriptional regulatory networks in Saccharomyces cerevisiae. Science. 2002;298:799–804. doi: 10.1126/science.1075090. - DOI - PubMed
    1. Siggia ED. Computational methods for transcriptional regulation. Curr Opin Genet Dev. 2005;15:214–221. doi: 10.1016/j.gde.2005.02.004. - DOI - PubMed
    1. Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM. Systematic determination of genetic network architecture. Nat Genet. 1999;22:281–285. doi: 10.1038/10343. - DOI - PubMed
    1. Birnbaum K, Benfey PN, Shasha DE. cis element/transcription factor analysis (cis/TF): a method for discovering transcription factor/cis element relationships. Genome Res. 2001;11:1567–1573. doi: 10.1101/gr.158301. - DOI - PMC - PubMed

Publication types

LinkOut - more resources