Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2004 Mar 19;32(5):1798-807.
doi: 10.1093/nar/gkh507. Print 2004.

Calculating the statistical significance of physical clusters of co-regulated genes in the genome: the role of chromatin in domain-wide gene regulation

Affiliations

Calculating the statistical significance of physical clusters of co-regulated genes in the genome: the role of chromatin in domain-wide gene regulation

Cheng-Fu Chang et al. Nucleic Acids Res. .

Abstract

Physical clusters of co-regulated, but apparently functionally unrelated, genes are present in many genomes. Despite the important implication that the genomic environment contributes appreciably to the regulation of gene expression, no simple statistical method has been described to identify physical clusters of co-regulated genes. Here we report the development of a model that allows the direct calculation of the significance of such clusters. We have implemented the derived statistical relation in a software program, Pyxis, and have analyzed a selection of Saccharomyces cerevisiae gene expression microarray data sets. We have identified many gene clusters where constituent genes exhibited a regulatory dependence on proteins previously implicated in chromatin structure. Specifically, we found that Tup1p-dependent gene domains were enriched close to telomeres, which suggested a new role for Tup1p in telomere silencing. In addition, we identified Sir2p-, Sir3p- and Sir4p-dependent clusters, which suggested the presence of Sir-mediated heterochromatin in previously unidentified regions of the yeast genome. We also showed the presence of Sir4p-dependent gene clusters bordering the HMRa heterothallic locus, which suggested leaky termination of the heterochromatin by the boundary elements. These results demonstrate the utility of Pyxis in identifying possible higher order genomic features that may contribute to gene regulation in extended domains.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Matrix of combinations. (A) The full sample space, consisting of all possible combinations, for the distribution of three objects among five possible settings. (B) All possible combinations of three objects among four settings. The objects are represented by the grey squares.
Figure 2
Figure 2
The accuracy of the hypergeometric distribution to calculate the probability of physical clustering. (A) The dependence of the variance in cluster frequency on the size of the randomly generated population. The variance in the occurrence of clusters of defined size (y-axis) is shown as a function of the number of random distributions of 15 objects among 50 settings (x-axis). (B) The probability of the occurrence of a cluster calculated with the hypergeometric distribution is indistinguishable from the frequency observed in randomly generated in silico data. The frequency of occurrence of clusters that contained the indicated number of objects within a 10 setting wide window was calculated using equation 3 or determined in a population of 105 randomly generated distributions of 15 objects among 50 settings. The calculated (circles) and determined (squares) probabilities are shown (y-axis) as a function of the number of objects in the cluster window (x-axis).
Figure 3
Figure 3
Summary of the program logic of Pyxis. The ovals represent user selections or data supplied by the user, the rectangles show the main groups of programmatic actions, and the rounded rectangle represents the generated result.
Figure 4
Figure 4
Physical clusters of genes in the genome of S.cerevisiae in the absence of Tup1p, Sir2p, Sir3p or Sir4p. The positions of genes that were induced by at least 2-fold in a tup1, sir2, sir3 or a sir4 strain compared to the wild-type strain are shown for each of the 16 chromosomes. Gene clusters, composed of genes within five ORFs of its closest neighbor and where the probability of a similar grouping arising randomly was <0.1%, are identified by rectangles. Homologous gene pairs or groups were removed from clusters before calculation of the cluster significance. The positions of the MFA1 and BAR1 genes, the HMLα and HMRa heterothallic loci and the MAT locus are indicated. The 28 kb gene cluster on chromosome XIII in the sir2 strain is indicated by the asterisk. The microarray data for the tup1 strain was obtained from the Brown study (23) and the sir2, sir3 and sir4 data from the Young study (39).
Figure 5
Figure 5
ORF map of a statistically significant (P < 0.001) cluster composed of directly adjacent ORFs that were induced by at least 2-fold in a tup1 S.cerevisiae strain. The analysis was performed on the data from the Brown study (23). The black arrows represent ORFs with the direction of transcription indicated. The length and spacing of ORFs are shown to scale. The random probability for the appearance of the cluster is indicated.
Figure 6
Figure 6
Genes that are regulated in common by the Sir proteins. Genes that were up-regulated by 2-fold or more in the sir2, sir3 or sir4 strain (39) were identified and common genes in each data set pair selected from the SQL database. The number of genes that displayed a regulatory dependence on one or more of the Sir proteins is shown in the Venn diagram.

Similar articles

Cited by

References

    1. Cohen B.A., Mitra,R.D., Hughes,J.D. and Church,G.M. (2000) A computational analysis of whole-genome expression data reveals chromosomal domains of gene expression. Nature Genet., 26, 183–186. - PubMed
    1. de Haan G., Bystrykh,L.V., Weersing,E., Dontje,B., Geiger,H., Ivanova,N., Lemischka,I.R., Vellenga,E. and Van Zant,G. (2002) A genetic and genomic analysis identifies a cluster of genes associated with hematopoietic cell turnover. Blood, 100, 2056–2062. - PubMed
    1. Roy P.J., Stuart,J.M., Lund,J. and Kim,S.K. (2002) Chromosomal clustering of muscle-expressed genes in Caenorhabditis elegans. Nature, 418, 975–979. - PubMed
    1. Spellman P.T. and Rubin,G.M. (2002) Evidence for large domains of similarly expressed genes in the Drosophila genome. J. Biol., 1, 5. - PMC - PubMed
    1. Zhang H., Pan,K.H. and Cohen,S.N. (2003) Senescence-specific gene expression fingerprints reveal cell-type-dependent physical clustering of up-regulated chromosomal loci. Proc. Natl Acad. Sci. USA, 100, 3251–3256. - PMC - PubMed

Publication types

MeSH terms