. 2012 Jun 19:13:135.

doi: 10.1186/1471-2105-13-135.

β-empirical Bayes inference and model diagnosis of microarray data

Mohammad Manir Hossain Mollah¹, M Nurul Haque Mollah, Hirohisa Kishino

Affiliations

PMID: 22713095
PMCID: PMC3464654
DOI: 10.1186/1471-2105-13-135

β-empirical Bayes inference and model diagnosis of microarray data

Mohammad Manir Hossain Mollah et al. BMC Bioinformatics. 2012.

. 2012 Jun 19:13:135.

doi: 10.1186/1471-2105-13-135.

Authors

Mohammad Manir Hossain Mollah¹, M Nurul Haque Mollah, Hirohisa Kishino

Affiliation

¹ Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-8657, Japan. mollah@lbm.ab.a.u-tokyo.ac.jp

PMID: 22713095
PMCID: PMC3464654
DOI: 10.1186/1471-2105-13-135

Abstract

Background: Microarray data enables the high-throughput survey of mRNA expression profiles at the genomic level; however, he data presents a challenging statistical problem because of the large number of transcripts with small sample sizes that are obtained. To reduce the dimensionality, various Bayesian or empirical Bayes hierarchical models have been developed. However, because of the complexity of the microarray data, no model can explain the data fully. It is generally difficult to scrutinize the irregular patterns of expression that are not expected by the usual statistical gene by gene models.

Results: As an extension of empirical Bayes (EB) procedures, we have developed the β-empirical Bayes (β-EB) approach based on a β-likelihood measure which can be regarded as an 'evidence-based' weighted (quasi-) likelihood inference. The weight of a transcript t is described as a power function of its likelihood, fβ(yt|θ). Genes with low likelihoods have unexpected expression patterns and low weights. By assigning low weights to outliers, the inference becomes robust. The value of β, which controls the balance between the robustness and efficiency, is selected by maximizing the predictive β₀-likelihood by cross-validation. The proposed β-EB approach identified six significant (p<10⁻⁵) contaminated transcripts as differentially expressed (DE) in normal/tumor tissues from the head and neck of cancer patients. These six genes were all confirmed to be related to cancer; they were not identified as DE genes by the classical EB approach. When applied to the eQTL analysis of Arabidopsis thaliana, the proposed β-EB approach identified some potential master regulators that were missed by the EB approach.

Conclusions: The simulation data and real gene expression data showed that the proposed β-EB method was robust against outliers. The distribution of the weights was used to scrutinize the irregular patterns of expression and diagnose the model statistically. When β-weights outside the range of the predicted distribution were observed, a detailed inspection of the data was carried out. The β-weights described here can be applied to other likelihood-based statistical models for diagnosis, and may serve as a useful tool for transcriptome and proteome studies.

PubMed Disclaimer

Figures

**Figure 1**
β**-weights can diagnose a misspecified model.** (a) Scatter plot of log(aa) versus β-weight. Many of the genes with a shape parameter (aa) less than 1 have small β− weights. (b) The true distribution of gamma for different values of the shape parameter when the value of scale parameter is one. (c) The log-transformed expressions based on genes between weight < 0.53 and log(aa) < -1 in (a) are plotted below the lines for group 2 tissues and above the lines for group 1 tissues. The genes with low β− weights were shown to have heavy lower tails. (d) The log-transformed expressions based on genes between weight ≥ 0.6 and log(aa) < -1 in (a) are plotted below the lines for group 2 tissues and above the lines for group 1 tissues. The log-transformed expression profiles of these genes were shown to be similar to the normal distribution.

**Figure 2**
**The distribution of the**β**weights for the head and neck cancer data.** The observed distribution (blue) of β-weights was qualitatively similar to the parametric bootstrap-based predicted distribution (red) with the exception of 261 outliers (2.2% of the total genes) with small β-weights (p<10⁵).

**Figure 3**
**Posterior probabilities estimated by EB and**β**-EB for the head and neck cancer data.** (a) Scatter plot of the posterior probabilities (pp.) estimated by the proposed β-EB approach and by the classical EB-LNN approach. The red “+” marks represent outliers with β-weights for which the p-values <10⁻⁵. The blue “o” marks the outliers that were identified as DE by the β-EB approach (*pp.*>0.95) and as EE by the original EB approach (*pp.*<0.5). (b) Expression levels of the six genes (marked by the blue “o” in (a)) that were identified as DE by the β-EB approach and as EE by the EB approach. The log-transformed expressions are plotted below the lines for the tumor tissues and above the lines for the normal tissues. Outliers with low β-weights are indicated in red.

**Figure 4**
**The distribution of the**β**weights for the lung cancer data.** The observed distribution (blue) of β-weights showed a large deviation from the predicted distribution (red). Because the observed distribution has extremely heavy tails on both sides compared with the predicted distribution, we put lower and upper 1⁰⁻⁵tiles for the predicted distribution.

**Figure 5**
**Features of the expression profiles of the two types of lung cancer data.(a)** Distribution of the log mean expression levels. The distribution of the outlier genes is shown distribution in blue. **(b)** Scatter plot of gene-specific means versus standard deviations. The red dots represent genes with low β-weights (p<1⁰⁻⁵); green dots represent genes with high weights (p<1⁰⁻⁵); and the blue dots represent the outlier genes. **(c)** When transcripts with little variation (standard deviation < 0.05) were excluded, the upper heavy tail observed in Figure 4 disappeared.

**Figure 6**
**Genomic architecture of the eQTL study across the five*A. thaliana*chromosomes.** (a) Expected numbers of DE transcripts/e-traits (y-axis) plotted against the marker location in mega bases (Mb) on the x-axis. (b) Parametric predicted distribution (red) and observed distribution (blue) of β-weights for the *A. thaliana* data were measured for marker 73 on chromosome 4. The observed distribution showed a large deviation from the predicted distribution. (c) Expression levels of the 18 transcript with weights less than 0.003 (i.e., w < .003). The log-transformed expressions are plotted below the lines for marker genotype “B” and above the lines or marker genotype “A”. Outliers with low β-weights are indicated in red.

See this image and copyright information in PMC

Cited by

Robust Significance Analysis of Microarrays by Minimum β-Divergence Method.
Shahjaman M, Kumar N, Mollah MMH, Ahmed MS, Ara Begum A, Shahinul Islam SM, Mollah MNH. Shahjaman M, et al. Biomed Res Int. 2017;2017:5310198. doi: 10.1155/2017/5310198. Epub 2017 Jul 27. Biomed Res Int. 2017. PMID: 28819626 Free PMC article.
A 19-Gene expression signature as a predictor of survival in colorectal cancer.
Abdul Aziz NA, Mokhtar NM, Harun R, Mollah MM, Mohamed Rose I, Sagap I, Mohd Tamil A, Wan Ngah WZ, Jamal R. Abdul Aziz NA, et al. BMC Med Genomics. 2016 Sep 8;9(1):58. doi: 10.1186/s12920-016-0218-1. BMC Med Genomics. 2016. PMID: 27609023 Free PMC article.
A Hybrid One-Way ANOVA Approach for the Robust and Efficient Estimation of Differential Gene Expression with Multiple Patterns.
Mollah MM, Jamal R, Mokhtar NM, Harun R, Mollah MN. Mollah MM, et al. PLoS One. 2015 Sep 28;10(9):e0138810. doi: 10.1371/journal.pone.0138810. eCollection 2015. PLoS One. 2015. PMID: 26413858 Free PMC article.
Robust volcano plot: identification of differential metabolites in the presence of outliers.
Kumar N, Hoque MA, Sugimoto M. Kumar N, et al. BMC Bioinformatics. 2018 Apr 11;19(1):128. doi: 10.1186/s12859-018-2117-2. BMC Bioinformatics. 2018. PMID: 29642836 Free PMC article.
Robustification of Naïve Bayes Classifier and Its Application for Microarray Gene Expression Data Analysis.
Ahmed MS, Shahjaman M, Rana MM, Mollah MNH. Ahmed MS, et al. Biomed Res Int. 2017;2017:3020627. doi: 10.1155/2017/3020627. Epub 2017 Aug 7. Biomed Res Int. 2017. PMID: 28848763 Free PMC article.

References

1. Chiogna M, Massa MS, Risso D, Romualdi C. A comparison on effects of normalisations in the detection of differentially expressed genes. BMC Bioinformatics. 2009;10:61. - PMC - PubMed
1. Hein AM, Richardson S. A powerful method for detecting differentially expressed genes from GeneChip arrays that does not require replicates. BMC Bioinformatics. 2006;7:353. - PMC - PubMed
1. Kendziorski CM, Chen M, Yuan M, Lan H, Attie AD. Statistical methods for expression quantitative trait loci (eQTL) Mapping. Biometrics. 2006;62:19–27. - PubMed
1. Schadt EE, Monks SA, Drake TA. Genetics of gene expression surveyed in maize, mouse and man. Nature. 2003;422:297–302. - PubMed
1. Geistlinger L, Csaba G, Kuffner R, Mulder N, Zimmer R. From sets to graphs: towards a realistic enrichment analysis of transcriptomic systems. Bioinformatics. 2011;27:i366–i373. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

β-empirical Bayes inference and model diagnosis of microarray data

Affiliation

β-empirical Bayes inference and model diagnosis of microarray data

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Research Materials

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

LinkOut - more resources

Full Text Sources

Research Materials