Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Feb 4;5(2):e9056.
doi: 10.1371/journal.pone.0009056.

Classification of genes and putative biomarker identification using distribution metrics on expression profiles

Affiliations

Classification of genes and putative biomarker identification using distribution metrics on expression profiles

Hung-Chung Huang et al. PLoS One. .

Abstract

Background: Identification of genes with switch-like properties will facilitate discovery of regulatory mechanisms that underlie these properties, and will provide knowledge for the appropriate application of Boolean networks in gene regulatory models. As switch-like behavior is likely associated with tissue-specific expression, these gene products are expected to be plausible candidates as tissue-specific biomarkers.

Methodology/principal findings: In a systematic classification of genes and search for biomarkers, gene expression profiles (GEPs) of more than 16,000 genes from 2,145 mouse array samples were analyzed. Four distribution metrics (mean, standard deviation, kurtosis and skewness) were used to classify GEPs into four categories: predominantly-off, predominantly-on, graded (rheostatic), and switch-like genes. The arrays under study were also grouped and examined by tissue type. For example, arrays were categorized as 'brain group' and 'non-brain group'; the Kolmogorov-Smirnov distance and Pearson correlation coefficient were then used to compare GEPs between brain and non-brain for each gene. We were thus able to identify tissue-specific biomarker candidate genes.

Conclusions/significance: The methodology employed here may be used to facilitate disease-specific biomarker discovery.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Frequency histogram plot for the expression intensity profile of genes in four categories.
(a) predominantly-off, (b) predominantly-on, (c) graded (rheostatic), and (d) multistable (switch-like). The Y-axis is in log scale.
Figure 2
Figure 2. Kurtosis (K) vs.Skewness (S) plot for the GEPs of 16,361 features in 2,145 mouse arrays.
Curved line is for the boundary curve with the equation of K = S2−2.
Figure 3
Figure 3. A selected representative example comparing the GEPs of Grla3 in brain (solid line) and non-brain (dashed line) tissues.
Figure 4
Figure 4. GEP metrics for each gene clustered into one of four clusters.
Gene categories: (a) predominantly-off, (b) graded, (c) predominantly-on, and (d) switch-like genes.
Figure 5
Figure 5. Switch-like genes highlighted in the KEGG “ECM-receptor interaction” diagram.
Nodes representing switch-like genes are outlined in orange.
Figure 6
Figure 6. KS_d vs. Corr values for all GEPs.
The horizontal line represents the cutoff KS_d>0.8; the vertical line represents the cutoff Corr <0.1. Potential biomarkers are in the box at upper left. Similar figures for lung, liver, embryo, heart, and small intestine are in File S1.
Figure 7
Figure 7. GEPs of the 12 candidate biomarkers from Table 8.
Solid lines for brain specific GEPs, dashed lines for non-brain tissue GEPs. Log2 intensity values appear on the x-axis, probability density appears on the y-axis.
Figure 8
Figure 8. Heatmaps of log2 expression values of the twelve candidate genes from Table 8, in various tissues.
Green is low, black is middle and red is high expression.
Figure 9
Figure 9. Tissue expression pattern and profiles for the gene Gabrg2.
Left: (a) Tissue expression pattern for Gabrg2 (in Table 8 ) via WebGestalt. The height of the red bar represents the observed number of EST sequences for the selected gene in the tissue. The height of the green bar represents the expected number of EST sequences (Expected number of EST sequences for a specific gene in a specific tissue  =  Total number of EST sequences for the gene in all tissues x Total number of EST sequences for all genes in the tissue/Total number of EST sequences for all genes in all tissues). Tissue types in which the gene is significantly over-represented (p<0.01) are labeled red. Tissue types in which the gene is significantly under-represented (p<0.01) are labeled blue. Right: (b) Expression profile suggested by analysis of EST counts in UniGene database for the gene Gabrg2 (UniGene ID: Mm.5309); only portion (top part) of the profile is shown. The first column is for the pool name. The number on the left side of the spot column is in the unit of “Transcripts per million” (TPM) and the ratio on the right side of the spot column is the ratio of “EST of the query gene /Total EST in pool”. The spot intensity is based on TPM.

Similar articles

Cited by

References

    1. Angeli D, Ferrell JE, Jr, Sontag ED. Detection of multistability, bifurcations, and hysteresis in a large class of biological positive-feedback systems. Proc Natl Acad Sci U S A. 2004;101:1822–1827. - PMC - PubMed
    1. Kaznessis YN. Models for synthetic biology. BMC Syst Biol 2007. 2007;1:47. - PMC - PubMed
    1. Ingolia NT, Murray AW. Positive-feedback loops as a flexible biological module. Curr Biol. 2007;17:668–677. - PMC - PubMed
    1. Ninfa AJ, Mayo AE. Hysteresis vs. graded responses: the connections make all the difference. Sci STKE 2004. 2004:pe20. - PubMed
    1. Pomerening JR, Sontag ED, Ferrell JE., Jr Building a cell cycle oscillator: hysteresis and bistability in the activation of Cdc2. Nat Cell Biol. 2003;5:346–351. - PubMed

Publication types

MeSH terms