Comparative Study

. 2005 Jun 29:6:165.

doi: 10.1186/1471-2105-6-165.

Identifying differential expression in multiple SAGE libraries: an overdispersed log-linear model approach

Jun Lu¹, John K Tomfohr, Thomas B Kepler

Affiliations

PMID: 15987513
PMCID: PMC1189357
DOI: 10.1186/1471-2105-6-165

Comparative Study

Identifying differential expression in multiple SAGE libraries: an overdispersed log-linear model approach

Jun Lu et al. BMC Bioinformatics. 2005.

. 2005 Jun 29:6:165.

doi: 10.1186/1471-2105-6-165.

Authors

Jun Lu¹, John K Tomfohr, Thomas B Kepler

Affiliation

¹ Department of Biostatistics & Bioinformatics, Duke University, Durham, North Carolina 27708, USA. lu000014@mc.duke.edu

PMID: 15987513
PMCID: PMC1189357
DOI: 10.1186/1471-2105-6-165

Abstract

Background: In testing for differential gene expression involving multiple serial analysis of gene expression (SAGE) libraries, it is critical to account for both between and within library variation. Several methods have been proposed, including the t test, tw test, and an overdispersed logistic regression approach. The merits of these tests, however, have not been fully evaluated. Questions still remain on whether further improvements can be made.

Results: In this article, we introduce an overdispersed log-linear model approach to analyzing SAGE; we evaluate and compare its performance with three other tests: the two-sample t test, tw test and another based on overdispersed logistic linear regression. Analysis of simulated and real datasets show that both the log-linear and logistic overdispersion methods generally perform better than the t and tw tests; the log-linear method is further found to have better performance than the logistic method, showing equal or higher statistical power over a range of parameter values and with different data distributions.

Conclusion: Overdispersed log-linear models provide an attractive and reliable framework for analyzing SAGE experiments involving multiple libraries. For convenience, the implementation of this method is available through a user-friendly web-interface available at http://www.cbcb.duke.edu/sage.

PubMed Disclaimer

Figures

**Figure 2**
**Comparisons based on simulated data from the negative binomial distribution**. The ROC curves of the four tests are based on datasets generated from the negative binomial distribution with various magnitudes of overdispersion (φ). The data are simulated by the same strategy as used in Figure 1, except that p_B= 4p_A. Note that the overdispersion parameter here is not directly comparable with that in Figure 1 (the parameter φ for the negative binomial is not directly related to that for the beta-binomial). For figures generated under other conditions, see Additional file 2.

**Figure 3**
**Comparing p-values from the logit-t test and those from the log-t test**. Of the top 100 tags (ranked according to p-values) identified by the logit-t test and by the log-t test, 82 are common to both leaving 18 tags from each test that are not within the top 100 identified by the other. The p-values from both tests for these 36 remaining tags are plotted here. The circles represent the 18 in the top 100 by the logit-t test and the triangles those from the log-t test. While all the tags identified by the logit-t test also have reasonably low p-values according to the log-t test, the tags identified by the log-t test show a much wider range of p-values according to the logit-t test.

**Figure 4**
**Plot of standardized residuals against estimated proportions**. Standardized Pearson's residuals (y-axis) plotted vs. the proportion estimates (x-axis) for the two groups. The standardized Pearson's residuals are asymptotically distributed as a standard normal. The model fits of two tags (among the list of genes in Table 5) are shown here; the left is from the fit using the overdispersed logistic model and the right from the overdispersed log-linear model. A lower variance of residuals in the group (normal) with lower mean proportion is an indication of poor model fit.

**Figure 5**
**The distribution of overdispersion estimates** (). The estimates are from the overdispersed log-linear model fit to the pancreas data. Tags with the overdispersion estimate 0 are not shown in the figure.

formula image — **Figure 5**
**The distribution of overdispersion estimates** (). The estimates are from the overdispersed log-linear model fit to the pancreas data. Tags with the overdispersion estimate 0 are not shown in the figure.

See this image and copyright information in PMC

Cited by

Statistical design and analysis of RNA sequencing data.
Auer PL, Doerge RW. Auer PL, et al. Genetics. 2010 Jun;185(2):405-16. doi: 10.1534/genetics.110.114983. Epub 2010 May 3. Genetics. 2010. PMID: 20439781 Free PMC article.
Statistical methods for detecting differentially abundant features in clinical metagenomic samples.
White JR, Nagarajan N, Pop M. White JR, et al. PLoS Comput Biol. 2009 Apr;5(4):e1000352. doi: 10.1371/journal.pcbi.1000352. Epub 2009 Apr 10. PLoS Comput Biol. 2009. PMID: 19360128 Free PMC article.
Social disparities in the use of colonoscopy by primary care physicians in Ontario.
Jacob BJ, Baxter NN, Moineddin R, Sutradhar R, Del Giudice L, Urbach DR. Jacob BJ, et al. BMC Gastroenterol. 2011 Sep 28;11:102. doi: 10.1186/1471-230X-11-102. BMC Gastroenterol. 2011. PMID: 21955593 Free PMC article.
ProbFAST: Probabilistic functional analysis system tool.
Silva IT, Vêncio RZ, Oliveira TY, Molfetta GA, Silva WA Jr. Silva IT, et al. BMC Bioinformatics. 2010 Mar 30;11:161. doi: 10.1186/1471-2105-11-161. BMC Bioinformatics. 2010. PMID: 20353576 Free PMC article.
CAMDA 2023: Finding patterns in urban microbiomes.
Contreras-Peruyero H, Nuñez I, Vazquez-Rosas-Landa M, Santana-Quinteros D, Pashkov A, Carranza-Barragán ME, Perez-Estrada R, Guerrero-Flores S, Balanzario E, Muñiz Sánchez V, Nakamura M, Ramírez-Ramírez LL, Sélem-Mojica N. Contreras-Peruyero H, et al. Front Genet. 2024 Nov 25;15:1449461. doi: 10.3389/fgene.2024.1449461. eCollection 2024. Front Genet. 2024. PMID: 39655221 Free PMC article.

See all "Cited by" articles

References

1. Velculescu VE, Zhang L, Vogelstein B, Kinzler KW. Serial analysis of gene expression.[comment] Science. 1995;270:484–487. - PubMed
1. Zhang L, Zhou W, Velculescu VE, Kern SE, Hruban RH, Hamilton SR, Vogelstein B, Kinzler KW. Gene expression profiles in normal and cancer cells. Science. 1997;276:1268–1272. doi: 10.1126/science.276.5316.1268. - DOI - PubMed
1. Riggins GJ, Strausberg RL. Genome and genetic resources from the Cancer Genome Anatomy Project. Human Molecular Genetics. 2001;10:663–667. doi: 10.1093/hmg/10.7.663. - DOI - PubMed
1. Porter D, Lahti-Domenici J, Keshaviah A, Bae YK, Argani P, Marks J, Richardson A, Cooper A, Strausberg R, Riggins GJ, Schnitt S, Gabrielson E, Gelman R, Polyak K. Molecular markers in ductal carcinoma in situ of the breast. Molecular Cancer Research: MCR. 2003;1:362–375. - PubMed
1. Audic S, Claverie JM. The significance of digital gene expression profiles. Genome Research. 1997;7:986–995. - PubMed

Publication types

Actions
Actions
Actions
Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Identifying differential expression in multiple SAGE libraries: an overdispersed log-linear model approach

Affiliation

Identifying differential expression in multiple SAGE libraries: an overdispersed log-linear model approach

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources