Comparative Study

. 2011 Sep 15;10(1):42.

doi: 10.2202/1544-6115.1701.

Fully moderated T-statistic for small sample size gene expression arrays

Lianbo Yu¹, Parul Gulati, Soledad Fernandez, Michael Pennell, Lawrence Kirschner, David Jarjoura

Affiliations

PMID: 23089813
PMCID: PMC3192003
DOI: 10.2202/1544-6115.1701

Comparative Study

Fully moderated T-statistic for small sample size gene expression arrays

Lianbo Yu et al. Stat Appl Genet Mol Biol. 2011.

. 2011 Sep 15;10(1):42.

doi: 10.2202/1544-6115.1701.

Authors

Lianbo Yu¹, Parul Gulati, Soledad Fernandez, Michael Pennell, Lawrence Kirschner, David Jarjoura

Affiliation

¹ The Ohio State University, USA.

PMID: 23089813
PMCID: PMC3192003
DOI: 10.2202/1544-6115.1701

Abstract

Gene expression microarray experiments with few replications lead to great variability in estimates of gene variances. Several Bayesian methods have been developed to reduce this variability and to increase power. Thus far, moderated t methods assumed a constant coefficient of variation (CV) for the gene variances. We provide evidence against this assumption, and extend the method by allowing the CV to vary with gene expression. Our CV varying method, which we refer to as the fully moderated t-statistic, was compared to three other methods (ordinary t, and two moderated t predecessors). A simulation study and a familiar spike-in data set were used to assess the performance of the testing methods. The results showed that our CV varying method had higher power than the other three methods, identified a greater number of true positives in spike-in data, fit simulated data under varying assumptions very well, and in a real data set better identified higher expressing genes that were consistent with functional pathways associated with the experiments.

PubMed Disclaimer

Figures

**Figure 1:**
Relationship between hyperparameters and gene expression for Simulation Model 1. A) Prior variance ( $s_{0 g}^{2}$ ). B) Prior degrees of freedom (d_0g).

**Figure 2:**
Power plots of four testing methods under 2 different simulation models. Power averaged over 100 simulated datasets was calculated separately for the simulated data by using 4 different testing methods: t-test (purple), SMT (blue), IBMT (red), and FMT (black). A) FMT simulation model (Model 1). B) SMT simulation model (Model 2).

**Figure 3:**
Prior degrees of freedom estimated by different moderated testing methods under the FMT simulation model (Model 1). Prior degrees of freedom were estimated using: SMT (blue), IBMT (red), and FMT (green) under the FMT simulation model (Model 1) and averaged over 100 simulated datasets.

**Figure 4:**
Observed true positives detected at different expression levels. Observed true positives averaged over 100 simulated datasets were plotted under 4 different testing methods: t-test (purple), SMT (blue), IBMT (red), and FMT (black) at three expression levels (low (1), medium (2), and high (3)) when PFER equals 5 under Model 1. (Numbers in parentheses in the legend indicate the number of true positives).

**Figure 5:**
Observed false positives detected at different expression levels. Observed false positives averaged over 100 simulated datasets were counted by using 4 different testing methods: t-test (purple), SMT (blue), IBMT (red), and FMT (black) at three expression levels (low (1), medium (2), and high (3)) when PFER equals 5 under Model 1. (Numbers in parentheses in the legend indicate the numbers of true negatives).

**Figure 6:**
Comparison of false positives among top 300 ranked genes. Genes were ranked by p-values, and the corresponding false positive numbers averaged over 100 simulated datasets were obtained separately for four different testing methods: t-test (purple), SMT (blue), IBMT (red), and FMT (black) when PFER equals 5 under Model 1.

**Figure 7:**
Summary plots of LOESS smoothing curves for estimating prior degrees of freedom. A) Mean of LOESS curves over 100 simulations for the window sizes of m = 10, 40, 200, 600, and 2000 genes. Black curve represents the true model. B) Standard deviation of LOESS curves over 100 simulations for the window sizes of m = 10, 40, 200, 600, and 2000 genes.

**Figure 8:**
Spike-in data prior degrees of freedom estimates. SMT (blue), IBMT (red), and FMT (black) methods were used to estimate prior degrees of freedom over average log expressions.

**Figure 9:**
Comparison of false positives among top ranked genes. Gene ranks based on p-values were obtained separately for 4 different testing methods: t-test (purple), SMT (blue), IBMT (red), and FMT (black). False positives counts were determined from among the top ranked genes.

**Figure 10:**
Estimated prior degrees of freedom and variance of log variances against log intensity in the real data example. A) Estimated prior degrees of freedom by FMT (green), IBMT (red), and SMT (blue). B) Estimated variance of log variances by FMT. Moving average (a window size of 40 genes) estimate (green) was obtained by LOESS local regression with span=0.95.

**Figure 11:**
Estimated prior variance against log intensity in the real data example. Prior variance estimates by FMT (green) and IBMT (red) were obtained through fitting a LOESS local regression on the adjusted log-variance *e_g* with span=0.75. Differences between FMT and IBMT are mainly due to differences in estimated prior degrees of freedom. Prior variance estimate for SMT (blue) is constant.

**Figure 12:**
Venn Diagram of the 4 significant gene lists from 4 different testing methods:t-test, SMT, IBMT, and FMT

See this image and copyright information in PMC

Cited by

Metabolic gene NR4A1 as a potential therapeutic target for non-smoking female non-small cell lung cancer patients.
Sun R, Bao MY, Long X, Yuan Y, Wu MM, Li X, Bao JK. Sun R, et al. Thorac Cancer. 2019 Apr;10(4):715-727. doi: 10.1111/1759-7714.12989. Epub 2019 Feb 25. Thorac Cancer. 2019. PMID: 30806032 Free PMC article.
Disruption of stromal hedgehog signaling initiates RNF5-mediated proteasomal degradation of PTEN and accelerates pancreatic tumor growth.
Pitarresi JR, Liu X, Avendano A, Thies KA, Sizemore GM, Hammer AM, Hildreth BE 3rd, Wang DJ, Steck SA, Donohue S, Cuitiño MC, Kladney RD, Mace TA, Chang JJ, Ennis CS, Li H, Reeves RH, Blackshaw S, Zhang J, Yu L, Fernandez SA, Frankel WL, Bloomston M, Rosol TJ, Lesinski GB, Konieczny SF, Guttridge DC, Rustgi AK, Leone G, Song JW, Wu J, Ostrowski MC. Pitarresi JR, et al. Life Sci Alliance. 2018 Oct 26;1(5):e201800190. doi: 10.26508/lsa.201800190. eCollection 2018 Oct. Life Sci Alliance. 2018. PMID: 30456390 Free PMC article.
Eps15 Homology Domain-containing Protein 3 Regulates Cardiac T-type Ca2+ Channel Targeting and Function in the Atria.
Curran J, Musa H, Kline CF, Makara MA, Little SC, Higgins JD, Hund TJ, Band H, Mohler PJ. Curran J, et al. J Biol Chem. 2015 May 8;290(19):12210-21. doi: 10.1074/jbc.M115.646893. Epub 2015 Mar 30. J Biol Chem. 2015. PMID: 25825486 Free PMC article.
Robust differential expression analysis by learning discriminant boundary in multi-dimensional space of statistical attributes.
Bei Y, Hong P. Bei Y, et al. BMC Bioinformatics. 2016 Dec 19;17(1):541. doi: 10.1186/s12859-016-1386-x. BMC Bioinformatics. 2016. PMID: 27993137 Free PMC article.
The role of extracellular matrix in mouse and human corneal neovascularization.
Barbariga M, Vallone F, Mosca E, Bignami F, Magagnotti C, Fonteyne P, Chiappori F, Milanesi L, Rama P, Andolfo A, Ferrari G. Barbariga M, et al. Sci Rep. 2019 Oct 3;9(1):14272. doi: 10.1038/s41598-019-50718-8. Sci Rep. 2019. PMID: 31582785 Free PMC article.

See all "Cited by" articles

References

1. Benjamini Y, Hochberg Y. “Controlling the false discovery rate: A practical and powerful approach to multiple testing,”. J. Roy. Statist. Soc. Ser. B. 1995;57:289–300.
1. Choe SE, Boutros M, Michelson AM, Church GM, Halfon MS. “Preferred analysis methods for affymetrix genechips revealed by a wholly defined control dataset,”. Genome Biology. 2005;6(2):R16. doi: 10.1186/gb-2005-6-2-r16. - DOI - PMC - PubMed
1. Cleveland WS. “Robust locally weighted regression and smoothing scatterplots,”. Journal of the American Statistical Association. 1979;74:829–836. doi: 10.2307/2286407. - DOI
1. Cleveland WS, Devlin SJ. “Locally-weighted regression: An approach to regression analysis by local fitting,”. Journal of the American Statistical Association. 1988;83:596–610. doi: 10.2307/2289282. - DOI
1. Cui X, Hwang JT, Qiu J, Blades NJ, Churchill GA. “Improved statistical tests for differential gene expression by shrinking variance components estimates,”. Biostatistics. 2005;6:59–75. doi: 10.1093/biostatistics/kxh018. - DOI - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Fully moderated T-statistic for small sample size gene expression arrays

Affiliation

Fully moderated T-statistic for small sample size gene expression arrays

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources