Arrow plot: a new graphical tool for selecting up and down regulated genes and genes differentially expressed on sample subgroups
- PMID: 22734592
- PMCID: PMC3542259
- DOI: 10.1186/1471-2105-13-147
Arrow plot: a new graphical tool for selecting up and down regulated genes and genes differentially expressed on sample subgroups
Abstract
Background: A common task in analyzing microarray data is to determine which genes are differentially expressed across two (or more) kind of tissue samples or samples submitted under experimental conditions. Several statistical methods have been proposed to accomplish this goal, generally based on measures of distance between classes. It is well known that biological samples are heterogeneous because of factors such as molecular subtypes or genetic background that are often unknown to the experimenter. For instance, in experiments which involve molecular classification of tumors it is important to identify significant subtypes of cancer. Bimodal or multimodal distributions often reflect the presence of subsamples mixtures. Consequently, there can be genes differentially expressed on sample subgroups which are missed if usual statistical approaches are used. In this paper we propose a new graphical tool which not only identifies genes with up and down regulations, but also genes with differential expression in different subclasses, that are usually missed if current statistical methods are used. This tool is based on two measures of distance between samples, namely the overlapping coefficient (OVL) between two densities and the area under the receiver operating characteristic (ROC) curve. The methodology proposed here was implemented in the open-source R software.
Results: This method was applied to a publicly available dataset, as well as to a simulated dataset. We compared our results with the ones obtained using some of the standard methods for detecting differentially expressed genes, namely Welch t-statistic, fold change (FC), rank products (RP), average difference (AD), weighted average difference (WAD), moderated t-statistic (modT), intensity-based moderated t-statistic (ibmT), significance analysis of microarrays (samT) and area under the ROC curve (AUC). On both datasets all differentially expressed genes with bimodal or multimodal distributions were not selected by all standard selection procedures. We also compared our results with (i) area between ROC curve and rising area (ABCR) and (ii) the test for not proper ROC curves (TNRC). We found our methodology more comprehensive, because it detects both bimodal and multimodal distributions and different variances can be considered on both samples. Another advantage of our method is that we can analyze graphically the behavior of different kinds of differentially expressed genes.
Conclusion: Our results indicate that the arrow plot represents a new flexible and useful tool for the analysis of gene expression profiles from microarrays.
Figures









Similar articles
-
A weighted average difference method for detecting differentially expressed genes from microarray data.Algorithms Mol Biol. 2008 Jun 26;3:8. doi: 10.1186/1748-7188-3-8. Algorithms Mol Biol. 2008. PMID: 18578891 Free PMC article.
-
Not proper ROC curves as new tool for the analysis of differentially expressed genes in microarray experiments.BMC Bioinformatics. 2008 Oct 3;9:410. doi: 10.1186/1471-2105-9-410. BMC Bioinformatics. 2008. PMID: 18834513 Free PMC article.
-
Ranking differentially expressed genes from Affymetrix gene expression data: methods with reproducibility, sensitivity, and specificity.Algorithms Mol Biol. 2009 Apr 22;4:7. doi: 10.1186/1748-7188-4-7. Algorithms Mol Biol. 2009. PMID: 19386098 Free PMC article.
-
Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data.BMC Bioinformatics. 2006 Jul 26;7:359. doi: 10.1186/1471-2105-7-359. BMC Bioinformatics. 2006. PMID: 16872483 Free PMC article.
-
Harnessing the complexity of gene expression data from cancer: from single gene to structural pathway methods.Biol Direct. 2012 Dec 10;7:44. doi: 10.1186/1745-6150-7-44. Biol Direct. 2012. PMID: 23227854 Free PMC article. Review.
Cited by
-
knnAUC: an open-source R package for detecting nonlinear dependence between one continuous variable and one binary variable.BMC Bioinformatics. 2018 Nov 22;19(1):448. doi: 10.1186/s12859-018-2427-4. BMC Bioinformatics. 2018. PMID: 30466390 Free PMC article.
-
Transcriptome profiling implicated in beneficiary actions of kimchi extracts against Helicobacter pylori infection.J Clin Biochem Nutr. 2021 Sep;69(2):171-187. doi: 10.3164/jcbn.20-116. Epub 2021 Mar 27. J Clin Biochem Nutr. 2021. PMID: 34616109 Free PMC article.
-
Bayesian nonparametric inference for the overlap coefficient: With an application to disease diagnosis.Stat Med. 2022 Sep 10;41(20):3879-3898. doi: 10.1002/sim.9480. Epub 2022 Jun 27. Stat Med. 2022. PMID: 35760708 Free PMC article.
-
Tumor Biomarkers for the Prediction of Distant Metastasis in Head and Neck Squamous Cell Carcinoma.Cancers (Basel). 2020 Apr 9;12(4):922. doi: 10.3390/cancers12040922. Cancers (Basel). 2020. PMID: 32283719 Free PMC article.
-
The length of the receiver operating characteristic curve and the two cutoff Youden index within a robust framework for discovery, evaluation, and cutoff estimation in biomarker studies involving improper receiver operating characteristic curves.Stat Med. 2021 Mar 30;40(7):1767-1789. doi: 10.1002/sim.8869. Epub 2021 Feb 2. Stat Med. 2021. PMID: 33530129 Free PMC article.
References
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources