Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Jun 19:13:135.
doi: 10.1186/1471-2105-13-135.

β-empirical Bayes inference and model diagnosis of microarray data

Affiliations

β-empirical Bayes inference and model diagnosis of microarray data

Mohammad Manir Hossain Mollah et al. BMC Bioinformatics. .

Abstract

Background: Microarray data enables the high-throughput survey of mRNA expression profiles at the genomic level; however, he data presents a challenging statistical problem because of the large number of transcripts with small sample sizes that are obtained. To reduce the dimensionality, various Bayesian or empirical Bayes hierarchical models have been developed. However, because of the complexity of the microarray data, no model can explain the data fully. It is generally difficult to scrutinize the irregular patterns of expression that are not expected by the usual statistical gene by gene models.

Results: As an extension of empirical Bayes (EB) procedures, we have developed the β-empirical Bayes (β-EB) approach based on a β-likelihood measure which can be regarded as an 'evidence-based' weighted (quasi-) likelihood inference. The weight of a transcript t is described as a power function of its likelihood, fβ(yt|θ). Genes with low likelihoods have unexpected expression patterns and low weights. By assigning low weights to outliers, the inference becomes robust. The value of β, which controls the balance between the robustness and efficiency, is selected by maximizing the predictive β₀-likelihood by cross-validation. The proposed β-EB approach identified six significant (p<10⁻⁵) contaminated transcripts as differentially expressed (DE) in normal/tumor tissues from the head and neck of cancer patients. These six genes were all confirmed to be related to cancer; they were not identified as DE genes by the classical EB approach. When applied to the eQTL analysis of Arabidopsis thaliana, the proposed β-EB approach identified some potential master regulators that were missed by the EB approach.

Conclusions: The simulation data and real gene expression data showed that the proposed β-EB method was robust against outliers. The distribution of the weights was used to scrutinize the irregular patterns of expression and diagnose the model statistically. When β-weights outside the range of the predicted distribution were observed, a detailed inspection of the data was carried out. The β-weights described here can be applied to other likelihood-based statistical models for diagnosis, and may serve as a useful tool for transcriptome and proteome studies.

PubMed Disclaimer

Figures

Figure 1
Figure 1
β-weights can diagnose a misspecified model. (a) Scatter plot of log(aa) versus β-weight. Many of the genes with a shape parameter (aa) less than 1 have small β− weights. (b) The true distribution of gamma for different values of the shape parameter when the value of scale parameter is one. (c) The log-transformed expressions based on genes between weight < 0.53 and log(aa) < -1 in (a) are plotted below the lines for group 2 tissues and above the lines for group 1 tissues. The genes with low β− weights were shown to have heavy lower tails. (d) The log-transformed expressions based on genes between weight ≥ 0.6 and log(aa) < -1 in (a) are plotted below the lines for group 2 tissues and above the lines for group 1 tissues. The log-transformed expression profiles of these genes were shown to be similar to the normal distribution.
Figure 2
Figure 2
The distribution of theβweights for the head and neck cancer data. The observed distribution (blue) of β-weights was qualitatively similar to the parametric bootstrap-based predicted distribution (red) with the exception of 261 outliers (2.2% of the total genes) with small β-weights (p<105).
Figure 3
Figure 3
Posterior probabilities estimated by EB andβ-EB for the head and neck cancer data. (a) Scatter plot of the posterior probabilities (pp.) estimated by the proposed β-EB approach and by the classical EB-LNN approach. The red “+” marks represent outliers with β-weights for which the p-values <10−5. The blue “o” marks the outliers that were identified as DE by the β-EB approach (pp.>0.95) and as EE by the original EB approach (pp.<0.5). (b) Expression levels of the six genes (marked by the blue “o” in (a)) that were identified as DE by the β-EB approach and as EE by the EB approach. The log-transformed expressions are plotted below the lines for the tumor tissues and above the lines for the normal tissues. Outliers with low β-weights are indicated in red.
Figure 4
Figure 4
The distribution of theβweights for the lung cancer data. The observed distribution (blue) of β-weights showed a large deviation from the predicted distribution (red). Because the observed distribution has extremely heavy tails on both sides compared with the predicted distribution, we put lower and upper 10−5tiles for the predicted distribution.
Figure 5
Figure 5
Features of the expression profiles of the two types of lung cancer data.(a) Distribution of the log mean expression levels. The distribution of the outlier genes is shown distribution in blue. (b) Scatter plot of gene-specific means versus standard deviations. The red dots represent genes with low β-weights (p<10−5); green dots represent genes with high weights (p<10−5); and the blue dots represent the outlier genes. (c) When transcripts with little variation (standard deviation < 0.05) were excluded, the upper heavy tail observed in Figure 4 disappeared.
Figure 6
Figure 6
Genomic architecture of the eQTL study across the fiveA. thalianachromosomes. (a) Expected numbers of DE transcripts/e-traits (y-axis) plotted against the marker location in mega bases (Mb) on the x-axis. (b) Parametric predicted distribution (red) and observed distribution (blue) of β-weights for the A. thaliana data were measured for marker 73 on chromosome 4. The observed distribution showed a large deviation from the predicted distribution. (c) Expression levels of the 18 transcript with weights less than 0.003 (i.e., w < .003). The log-transformed expressions are plotted below the lines for marker genotype “B” and above the lines or marker genotype “A”. Outliers with low β-weights are indicated in red.

Similar articles

Cited by

References

    1. Chiogna M, Massa MS, Risso D, Romualdi C. A comparison on effects of normalisations in the detection of differentially expressed genes. BMC Bioinformatics. 2009;10:61. - PMC - PubMed
    1. Hein AM, Richardson S. A powerful method for detecting differentially expressed genes from GeneChip arrays that does not require replicates. BMC Bioinformatics. 2006;7:353. - PMC - PubMed
    1. Kendziorski CM, Chen M, Yuan M, Lan H, Attie AD. Statistical methods for expression quantitative trait loci (eQTL) Mapping. Biometrics. 2006;62:19–27. - PubMed
    1. Schadt EE, Monks SA, Drake TA. Genetics of gene expression surveyed in maize, mouse and man. Nature. 2003;422:297–302. - PubMed
    1. Geistlinger L, Csaba G, Kuffner R, Mulder N, Zimmer R. From sets to graphs: towards a realistic enrichment analysis of transcriptomic systems. Bioinformatics. 2011;27:i366–i373. - PMC - PubMed

Publication types