Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Apr 16:9:172.
doi: 10.1186/1471-2164-9-172.

How many human genes can be defined as housekeeping with current expression data?

Affiliations

How many human genes can be defined as housekeeping with current expression data?

Jiang Zhu et al. BMC Genomics. .

Abstract

Background: Housekeeping (HK) genes are ubiquitously expressed in all tissue/cell types and constitute a basal transcriptome for the maintenance of basic cellular functions. Partitioning transcriptomes into HK and tissue-specific (TS) genes relatively is fundamental for studying gene expression and cellular differentiation. Although many studies have aimed at large-scale and thorough categorization of human HK genes, a meaningful consensus has yet to be reached.

Results: We collected two latest gene expression datasets (both EST and microarray data) from public databases and analyzed the gene expression profiles in 18 human tissues that have been well-documented by both two data types. Benchmarked by a manually-curated HK gene collection (HK408), we demonstrated that present data from EST sampling was far from saturated, and the inadequacy has limited the gene detectability and our understanding of TS expressions. Due to a likely over-stringent threshold, microarray data showed higher false negative rate compared with EST data, leading to a significant underestimation of HK genes. Based on EST data, we found that 40.0% of the currently annotated human genes were universally expressed in at least 16 of 18 tissues, as compared to only 5.1% specifically expressed in a single tissue. Our current EST-based estimate on human HK genes ranged from 3,140 to 6,909 in number, a ten-fold increase in comparison with previous microarray-based estimates.

Conclusion: We concluded that a significant fraction of human genes, at least in the currently annotated data depositories, was broadly expressed. Our understanding of tissue-specific expression was still preliminary and required much more large-scale and high-quality transcriptomic data in future studies. The new HK gene list categorized in this study will be useful for genome-wide analyses on structural and functional features of HK genes.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Gene expression in 18 tissues. Numbers of genes detected in each tissue are compared between microarray and EST data (A). Tissues are ranked from the poorly-sampled (left) to the highly-sampled (right) according to the EST data. The numbers of detected genes are plotted against the numbers of sampled ESTs for the 18 tissues (B). The sampling growth curve is fitted by Hill function f(x) = axb/(c+xb) with a = 17622.8, b = 0.8, c = 6259.7. The curve indicates that current transcriptome sampling is far from saturated. Percentage of genes is plotted against the number of tissues where they express to give the expression breadth distribution (C). Expression breadth in microarray data is compared against that in EST data, with color from white to blue indicating the number of incidence from low to high (D). The correlation of expression breadths between the two types of data is not significant (r = 0.42); 71% of the genes are detected in less number of tissues by microarray data than by EST data.
Figure 2
Figure 2
HK408 gene expression in 18 tissues. Numbers of HK408 genes detected in each tissue are compared between microarray and EST data (A). Tissues are ranked from the poorly-sampled (left) to the highly-sampled (right) according to the EST data. The numbers of detected HK408 genes are plotted against the numbers of sampled ESTs for the 18 tissues (B). The sampling growth curve is fitted by Hill function f(x) = axb/(c+xb) with a = 405.0, b = 2.4, c = 7.0e+10. Five tissues — muscle, ovary, heart, thymus and thyroid — are poorly sampled, primarily accounting for the absence of HK408 genes. The expression breadth of HK408 is predominantly enriched at the value 18 in the EST data (C) whereas a messy tail is observed across all breadth groups in microarray data, indicating a noisy nature and high FP rate (D).
Figure 3
Figure 3
Expression profiles of 20 human tRNA synthetases in 18 tissues. Rows and columns of the matrix represent genes and tissues, respectively. Tissues are ranked from the poorly-sampled (left) to the highly-sampled (right) according to the EST data. The darkness of the blue color indicates the original EST counts in EST data (A) and expression intensities in microarray data (B). Blank squares indicate absence of detection. Original EST counts are kept to demonstrate the increasing capability of gene identification from poorly-sampled to highly-sampled tissues.
Figure 4
Figure 4
Validation of EST-defined HK gene in 51 tissues. Expression breadth distributions in 51 human tissues currently having EST data are compared among total genes and HK genes defined in 18 tissues. The expression breadth distribution of total genes in 51 tissues has two modes representing TS and HK genes respectively, but due to the limited gene detectability in poorly sampled tissues, the spike of HK genes peaks at value 35 and diminishes as tissue broadness increases. The expression breadths of HK genes defined in 18 tissues peak at about value 42 showing very broad expression in 51 tissues.

Similar articles

Cited by

References

    1. Thellin O, Zorzi W, Lakaye B, De Borman B, Coumans B, Hennen G, Grisar T, Igout A, Heinen E. J Biotechnol. 2000/01/05. Vol. 75. 1999. Housekeeping genes as internal standards: use and limits; pp. 291–295. - DOI - PubMed
    1. Lee PD, Sladek R, Greenwood CM, Hudson TJ. Genome Res. 2002/02/06. Vol. 12. 2002. Control genes and variability: absence of ubiquitous reference transcripts in diverse mammalian expression studies; pp. 292–297. - DOI - PMC - PubMed
    1. Barber RD, Harmer DW, Coleman RA, Clark BJ. Physiol Genomics. 2005/03/17. Vol. 21. 2005. GAPDH as a housekeeping gene: analysis of GAPDH mRNA expression in a panel of 72 human tissues; pp. 389–395. - DOI - PubMed
    1. Glass JI, Assad-Garcia N, Alperovich N, Yooseph S, Lewis MR, Maruf M, Hutchison CA, 3rd, Smith HO, Venter JC. Proc Natl Acad Sci U S A. 2006/01/13. Vol. 103. 2006. Essential genes of a minimal bacterium; pp. 425–430. - DOI - PMC - PubMed
    1. Butte AJ, Dzau VJ, Glueck SB. Further defining housekeeping, or "maintenance," genes Focus on "A compendium of gene expression in normal human tissues". Physiol Genomics. 2001;7:95–96. - PubMed

Publication types