A regression-based differential expression detection algorithm for microarray studies with ultra-low sample size
- PMID: 25738861
- PMCID: PMC4349782
- DOI: 10.1371/journal.pone.0118198
A regression-based differential expression detection algorithm for microarray studies with ultra-low sample size
Abstract
Global gene expression analysis using microarrays and, more recently, RNA-seq, has allowed investigators to understand biological processes at a system level. However, the identification of differentially expressed genes in experiments with small sample size, high dimensionality, and high variance remains challenging, limiting the usability of these tens of thousands of publicly available, and possibly many more unpublished, gene expression datasets. We propose a novel variable selection algorithm for ultra-low-n microarray studies using generalized linear model-based variable selection with a penalized binomial regression algorithm called penalized Euclidean distance (PED). Our method uses PED to build a classifier on the experimental data to rank genes by importance. In place of cross-validation, which is required by most similar methods but not reliable for experiments with small sample size, we use a simulation-based approach to additively build a list of differentially expressed genes from the rank-ordered list. Our simulation-based approach maintains a low false discovery rate while maximizing the number of differentially expressed genes identified, a feature critical for downstream pathway analysis. We apply our method to microarray data from an experiment perturbing the Notch signaling pathway in Xenopus laevis embryos. This dataset was chosen because it showed very little differential expression according to limma, a powerful and widely-used method for microarray analysis. Our method was able to detect a significant number of differentially expressed genes in this dataset and suggest future directions for investigation. Our method is easily adaptable for analysis of data from RNA-seq and other global expression experiments with low sample size and high dimensionality.
Conflict of interest statement
Figures
Similar articles
-
A unified framework for finding differentially expressed genes from microarray experiments.BMC Bioinformatics. 2007 Sep 18;8:347. doi: 10.1186/1471-2105-8-347. BMC Bioinformatics. 2007. PMID: 17877806 Free PMC article.
-
Practical FDR-based sample size calculations in microarray experiments.Bioinformatics. 2005 Aug 1;21(15):3264-72. doi: 10.1093/bioinformatics/bti519. Epub 2005 Jun 2. Bioinformatics. 2005. PMID: 15932903
-
Differential gene expression detection and sample classification using penalized linear regression models.Bioinformatics. 2006 Feb 15;22(4):472-6. doi: 10.1093/bioinformatics/bti827. Epub 2005 Dec 13. Bioinformatics. 2006. PMID: 16352654
-
Transcriptome data analysis for cell culture processes.Adv Biochem Eng Biotechnol. 2012;127:27-70. doi: 10.1007/10_2011_116. Adv Biochem Eng Biotechnol. 2012. PMID: 22194060 Review.
-
Statistical issues in the design and analysis of gene expression microarray studies of animal models.J Mammary Gland Biol Neoplasia. 2003 Jul;8(3):359-74. doi: 10.1023/b:jomg.0000010035.57912.5a. J Mammary Gland Biol Neoplasia. 2003. PMID: 14973379 Review.
Cited by
-
Comparative analysis of tissue-specific genes in maize based on machine learning models: CNN performs technically best, LightGBM performs biologically soundest.Front Genet. 2023 May 9;14:1190887. doi: 10.3389/fgene.2023.1190887. eCollection 2023. Front Genet. 2023. PMID: 37229198 Free PMC article.
-
Xenopus embryos show a compensatory response following perturbation of the Notch signaling pathway.Dev Biol. 2020 Apr 15;460(2):99-107. doi: 10.1016/j.ydbio.2019.12.016. Epub 2019 Dec 30. Dev Biol. 2020. PMID: 31899211 Free PMC article.
-
The tweety Gene Family: From Embryo to Disease.Front Mol Neurosci. 2021 Jun 28;14:672511. doi: 10.3389/fnmol.2021.672511. eCollection 2021. Front Mol Neurosci. 2021. PMID: 34262434 Free PMC article.
-
Genomic signature of parity in the breast of premenopausal women.Breast Cancer Res. 2019 Mar 28;21(1):46. doi: 10.1186/s13058-019-1128-x. Breast Cancer Res. 2019. PMID: 30922380 Free PMC article.
-
Automated Classification of Benign and Malignant Proliferative Breast Lesions.Sci Rep. 2017 Aug 29;7(1):9900. doi: 10.1038/s41598-017-10324-y. Sci Rep. 2017. PMID: 28852119 Free PMC article.
References
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical