Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Apr 1;390(1-2):153-65.
doi: 10.1016/j.gene.2006.09.018. Epub 2006 Oct 5.

Repetitive sequence environment distinguishes housekeeping genes

Affiliations

Repetitive sequence environment distinguishes housekeeping genes

C Daniel Eller et al. Gene. .

Abstract

Housekeeping genes are expressed across a wide variety of tissues. Since repetitive sequences have been reported to influence the expression of individual genes, we employed a novel approach to determine whether housekeeping genes can be distinguished from tissue-specific genes by their repetitive sequence context. We show that Alu elements are more highly concentrated around housekeeping genes while various longer (>400-bp) repetitive sequences ("repeats"), including Long Interspersed Nuclear Element-1 (LINE-1) elements, are excluded from these regions. We further show that isochore membership does not distinguish housekeeping genes from tissue-specific genes and that repetitive sequence environment distinguishes housekeeping genes from tissue-specific genes in every isochore. The distinct repetitive sequence environment, in combination with other previously published sequence properties of housekeeping genes, was used to develop a method of predicting housekeeping genes on the basis of DNA sequence alone. Using expression across tissue types as a measure of success, we demonstrate that repetitive sequence environment is by far the most important sequence feature identified to date for distinguishing housekeeping genes.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Sequence environment of housekeeping genes and tissue-specific genes. (a) Sequence composition in 200-kb regions flanking housekeeping genes (HK), tissue-specific genes (TS) and a random sample of genes (RS). P-values were obtained using the Kruskal-Wallis test; error bars represent 95% confidence intervals. We also considered CpG island densities at various sized intervals around either the gene body or the start of transcription and found, in both cases, a 200-kb region to yield the most significant differences between the gene groups (data not shown). (b) Sequence properties for 1Mb intervals extending 40-Mb upstream and 40-Mb downstream around genes. The results were plotted using smoothed local regression (loess) curves and the regions over which significant differences between housekeeping genes and tissue-specific genes extend were calculated at a 95% confidence interval (region between arrows). Horizontal bars mark the regions around housekeeping (black) or tissue-specific (gray) genes with significant enrichment or reduction for a sequence feature. The length of the each horizontal bar was calculated by determining where each loess curve exceeds one standard deviation from the average value of the regions 30–40 Mb away from each gene.
Figure 2
Figure 2
HK and TS genes are grouped according to isochore membership for comparison of their 200kb flanking regions. Numbers of HK and TS genes assigned to each isochore are given beneath the columns in (a). (a) Repeats with statistically significant differences in concentration in at least one isochore. (b) Repeats grouped according to size. Short repeats are less than 400 bases in length and long repeats are greater than 400 bases. For (a) and (b), p-values were obtained using the Kruskal-Wallis test; error bars represent 95% confidence intervals. Isochore membership is assigned according to boundaries published as a custom track to the UCSC Genome Browser (http://genome.ucsc.edu) (Oliver et al., 2004).
Figure 3
Figure 3
Housekeeping gene classification. (a) Changes in false positive error rate when various characteristics are removed. Error bars represent 95% confidence intervals. (b) Relative importance of individual repeats in classifying housekeeping genes, as generated by the partial.plot function in the random forest package in R.
Figure 4
Figure 4
Local regression (loess) curves relating HK probability scores of genes to average expression level across all tissues. Gene expression levels were obtained from Gene Atlas #1. Spearman’s correlation coefficient and its associated p-value are reported.
Figure 5
Figure 5
Genome-wide relationships between HK probability, breadth of gene expression across tissues, and repeat concentration in 200-kb regions flanking genes. (a) Relationship between the calculated probability of being housekeeping genes (HK probability) and the proportion of tissues in which the genes are expressed. Gene expression data were obtained from Gene Atlas #1. (b) Partial dependence plots depict the marginal effect of each characteristic on HK Probability as determined by the random forest classifier. Negative probability values indicate that the characteristic weakens the classifier. Rug marks denote deciles and can be seen in greater detail in Supplementary Fig. 12 online. (c) Relationship between breadth of gene expression and concentration of Alu elements. For (a) and (c), Spearman correlation coefficients and associated p-values are reported.

References

    1. Abrao MG, Leite MV, Carvalho LR, Billerbeck AE, Nishi MY, Barbosa AS, Martin RM, Arnhold IJ, Mendonca BB. Combined pituitary hormone deficiency (CPHD) due to a complete PROP1 deletion. Clin Endocrinol (Oxf) 2006;65:294–300. - PubMed
    1. Abu-Safieh L, Vithana EN, Mantel I, Holder GE, Pelosini L, Bird AC, Bhattacharya SS. A large deletion in the adRP gene PRPF31: evidence that haploinsufficiency is the cause of disease. Mol Vis. 2006;12:384–8. - PubMed
    1. Allen E, Horvath S, Tong F, Kraft P, Spiteri E, Riggs AD, Marahrens Y. High concentrations of long interspersed nuclear element sequence distinguish monoallelically expressed genes. Proc Natl Acad Sci U S A. 2003;100:9940–5. - PMC - PubMed
    1. Allingham-Hawkins DJ, Brown CA, Babul R, Chitayat D, Krekewich K, Humphries T, Ray PN, Teshima IE. Tissue-specific methylation differences and cognitive function in fragile X premutation females. Am J Med Genet. 1996;64:329–33. - PubMed
    1. Arnaud P, Goubely C, Pelissier T, Deragon JM. SINE retroposons can be used in vivo as nucleation centers for de novo methylation. Mol Cell Biol. 2000;20:3434–41. - PMC - PubMed

Publication types

LinkOut - more resources