Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Oct 30:7:277.
doi: 10.1186/1471-2164-7-277.

Mining housekeeping genes with a Naive Bayes classifier

Affiliations

Mining housekeeping genes with a Naive Bayes classifier

Luna De Ferrari et al. BMC Genomics. .

Abstract

Background: Traditionally, housekeeping and tissue specific genes have been classified using direct assay of mRNA presence across different tissues, but these experiments are costly and the results not easy to compare and reproduce.

Results: In this work, a Naive Bayes classifier based only on physical and functional characteristics of genes already available in databases, like exon length and measures of chromatin compactness, has achieved a 97% success rate in classification of human housekeeping genes (93% for mouse and 90% for fruit fly).

Conclusion: The newly obtained lists of housekeeping and tissue specific genes adhere to the expected functions and tissue expression patterns for the two classes. Overall, the classifier shows promise, and in the future additional attributes might be included to improve its discriminating power.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Frequency distribution of cDNA length in human, mouse and fruit fly transcripts. The diagrams represent the frequency distribution of cDNA length (normalized to one) for housekeeping, tissue specific and total transcripts in human (A), mouse (B) and fruit fly (C).
Figure 2
Figure 2
Effects of discretisation on Naive Bayes classifier performance (human data). The success rate value is plotted in both the housekeeping (A) and the tissue specific (B) chart for comparison. Precision, Recall and F Measure for the ZeroRule classifier on housekeeping data (white bar) is equal to zero and hence not directly visible in the housekeeping chart (A).
Figure 3
Figure 3
ROC curves of maximum classification performance for human, mouse and fruit fly. For human and fruit fly: unsupervised discretisation (with equal frequency binning) + AODE algorithm for classification; for mouse: supervised discretisation + classic Naive Bayes algorithm for classification.
Figure 4
Figure 4
ROC curves of classification performance: housekeeping + all tissue specific genes versus housekeeping + tissue specific genes from merged lists. ROC curves comparing the classifier performance when all tissue specific genes or just tissue specific genes present in at least two published lists are used. For human and fruit fly: unsupervised discretisation (with equal frequency binning) + AODE algorithm for classification; for mouse: supervised discretisation + classic Naive Bayes algorithm for classification.
Figure 5
Figure 5
Tissue expression for known and predicted human housekeeping genes. Data extracted from UniGene dbEST in July 2005 (UniGene human build 186). The probability for predicted housekeeping transcripts is ≥ 90%. Discretisation: unsupervised; classification algorithm: AODE Naive Bayes.

References

    1. Butte AJ, Dzau VJ, Glueck SB. Further defining housekeeping, or maintenance, genes Focus on a compendium of gene expression in normal human tissues. Physiol Genomics. 2001;7:95–96. - PubMed
    1. Faure D. The Family-3 Glycoside Hydrolases: from Housekeeping Functions to Host-Microbe Interactions. Appl and Environ Microbiol. 2002;68:1485–1490. doi: 10.1128/AEM.68.4.1485-1490.2002. - DOI - PMC - PubMed
    1. Pancholi V, Chhatwal G. Housekeeping enzymes as virulence factors for pathogens. Int J Med Microbiol. 2003;293:391–401. doi: 10.1078/1438-4221-00283. - DOI - PubMed
    1. Kiratisin P, Li L, Murray PR, Fischer SH. Use of housekeeping gene sequencing for species identification of viridans streptococci. Diagn Microbiol Infect Dis. 2005;51:297–301. doi: 10.1016/j.diagmicrobio.2004.12.001. - DOI - PubMed
    1. Tanabe K, Sakihama N, Hattori T, Ranford-Cartwright L, Goldman I, Escalante AA, Lal AA. Genetic distance in housekeeping genes between Plasmodium falciparum and Plasmodium reichenowi and within P falciparum. J Mol Evol. 2004;59:687–694. doi: 10.1007/s00239-004-2662-3. - DOI - PubMed

Publication types

LinkOut - more resources