Modeling skewness in human transcriptomes
- PMID: 22701729
- PMCID: PMC3372486
- DOI: 10.1371/journal.pone.0038919
Modeling skewness in human transcriptomes
Abstract
Gene expression data are influenced by multiple biological and technological factors leading to a wide range of dispersion scenarios, although skewed patterns are not commonly addressed in microarray analyses. In this study, the distribution pattern of several human transcriptomes has been studied on free-access microarray gene expression data. Our results showed that, even in previously normalized gene expression data, probe and differential expression within probe effects suffer from substantial departures from the commonly assumed symmetric gaussian distribution. We developed a flexible mixed model for non-competitive microarray data analysis that accounted for asymmetric and heavy-tailed (Student's t distribution) dispersion processes. Random effects for gene expression data were modeled under asymmetric Student's t distributions where the asymmetry parameter (λ) took values from perfect symmetry (λ = 0) to right- (λ>0) or left-side (λ>0) over-expression patterns. This approach was applied to four free-access human data sets and revealed clearly better model performance when comparing with standard approaches accounting for traditional symmetric gaussian distribution patterns. Our analyses on human gene expression data revealed a substantial degree of right-hand asymmetry for probe effects, whereas differential gene expression addressed both symmetric and left-hand asymmetric patterns. Although these results cannot be extrapolated to all microarray experiments, they highlighted the incidence of skew dispersion patterns in human transcriptome; moreover, we provided a new analytical approach to appropriately address this biological phenomenon. The source code of the program accommodating these analytical developments and additional information about practical aspects on running the program are freely available by request to the corresponding author of this article.
Conflict of interest statement
Figures
Similar articles
-
Bayesian recursive mixed linear model for gene expression analyses with continuous covariates.J Anim Sci. 2012 Jan;90(1):67-75. doi: 10.2527/jas.2010-3750. Epub 2011 Sep 9. J Anim Sci. 2012. PMID: 21908645
-
Use of linear mixed models for genetic evaluation of gestation length and birth weight allowing for heavy-tailed residual effects.Genet Sel Evol. 2010 Jun 30;42(1):26. doi: 10.1186/1297-9686-42-26. Genet Sel Evol. 2010. PMID: 20591149 Free PMC article.
-
Including probe-level measurement error in robust mixture clustering of replicated microarray gene expression.Stat Appl Genet Mol Biol. 2010;9:Article42. doi: 10.2202/1544-6115.1600. Epub 2010 Dec 9. Stat Appl Genet Mol Biol. 2010. PMID: 21194414
-
Estimation and control of multiple testing error rates for microarray studies.Brief Bioinform. 2006 Mar;7(1):25-36. doi: 10.1093/bib/bbk002. Brief Bioinform. 2006. PMID: 16761362 Review.
-
Discovering patterns in microarray data.Mol Diagn. 2000 Dec;5(4):349-57. doi: 10.1007/BF03262096. Mol Diagn. 2000. PMID: 11172499 Review.
Cited by
-
A null model for Pearson coexpression networks.PLoS One. 2015 Jun 1;10(6):e0128115. doi: 10.1371/journal.pone.0128115. eCollection 2015. PLoS One. 2015. PMID: 26030917 Free PMC article.
-
The rise of the distributions: why non-normality is important for understanding the transcriptome and beyond.Biophys Rev. 2019 Feb;11(1):89-94. doi: 10.1007/s12551-018-0494-4. Epub 2019 Jan 7. Biophys Rev. 2019. PMID: 30617454 Free PMC article. Review.
-
Non-gaussian distributions affect identification of expression patterns, functional annotation, and prospective classification in human cancer genomes.PLoS One. 2012;7(10):e46935. doi: 10.1371/journal.pone.0046935. Epub 2012 Oct 31. PLoS One. 2012. PMID: 23118863 Free PMC article.
-
Investigating skewness to understand gene expression heterogeneity in large patient cohorts.BMC Bioinformatics. 2019 Dec 20;20(Suppl 24):668. doi: 10.1186/s12859-019-3252-0. BMC Bioinformatics. 2019. PMID: 31861976 Free PMC article.
-
A deep learning method to integrate extracelluar miRNA with mRNA for cancer studies.Bioinformatics. 2024 Nov 1;40(11):btae653. doi: 10.1093/bioinformatics/btae653. Bioinformatics. 2024. PMID: 39495117 Free PMC article.
References
-
- Hoeschele I, Li H. A note on joint versus gene-specific mixed model analysis of microarray gene expression data. Biostatistics. 2005;6:186. - PubMed
-
- Wolfinger RD, Gibson G, Wolfinger ED, Bennett L, Hamadeh H. Assessing gene significance from cDNA microarray expression data via mixed models. J Comput Biol. 2001;8:637. - PubMed
-
- Searle SR. Matrix Algebra Useful for Statistics. John Wiley & Sons, New York, NY. 1982.
-
- Casellas J, Ibáñez-Escriche N, Martínez-Giner M, Varona L. GEAMM v1.4.: a versatile program for mixed model analysis of gene expression data. Anim Genet. 2008;39:90. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources