Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Oct 3:9:410.
doi: 10.1186/1471-2105-9-410.

Not proper ROC curves as new tool for the analysis of differentially expressed genes in microarray experiments

Affiliations

Not proper ROC curves as new tool for the analysis of differentially expressed genes in microarray experiments

Stefano Parodi et al. BMC Bioinformatics. .

Abstract

Most microarray experiments are carried out with the purpose of identifying genes whose expression varies in relation with specific conditions or in response to environmental stimuli. In such studies, genes showing similar mean expression values between two or more groups are considered as not differentially expressed, even if hidden subclasses with different expression values may exist. In this paper we propose a new method for identifying differentially expressed genes, based on the area between the ROC curve and the rising diagonal (ABCR). ABCR represents a more general approach than the standard area under the ROC curve (AUC), because it can identify both proper (i.e., concave) and not proper ROC curves (NPRC). In particular, NPRC may correspond to those genes that tend to escape standard selection methods.

Results: We assessed the performance of our method using data from a publicly available database of 4026 genes, including 14 normal B cell samples (NBC) and 20 heterogeneous lymphomas (namely: 9 follicular lymphomas and 11 chronic lymphocytic leukemias). Moreover, NBC also included two sub-classes, i.e., 6 heavily stimulated and 8 slightly or not stimulated samples. We identified 1607 differentially expressed genes with an estimated False Discovery Rate of 15%. Among them, 16 corresponded to NPRC and all escaped standard selection procedures based on AUC and t statistics. Moreover, a simple inspection to the shape of such plots allowed to identify the two subclasses in either one class in 13 cases (81%).

Conclusion: NPRC represent a new useful tool for the analysis of microarray data.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Theoretical (dotted lines) and empirical (solid lines) ROC curves (panel A) and the corresponding distribution of gene expression values (panel B). Empirical ROC curves were obtained using 50 samples randomly selected from each class.
Figure 2
Figure 2
TNRC value for the 1607 top genes selected by ABCR at FDR = 15%, as a function of AUC (Panel A) and t statistics (Panel B). Area I includes genes corresponding to not proper ROC curves (blue circles); Area II includes genes under-expressed in malignant cells (green circles); Area III includes genes over-expressed in malignant cells (red circles); empty circles correspond to unselected genes. Solid lines represent the thresholds corresponding to p = 0.05 for TNRC (horizontal line in Panel A and in Panel B), for AUC (vertical lines in Panel A) and for t statistic (vertical lines in Panel B). Broken lines represent the expected value under the null hypothesis for AUC (Panel A) and for t statistics (Panel B).
Figure 3
Figure 3
Not proper ROC curve corresponding to the expression of gene n. 1 in Table 2 (GENE3389X: Immunoglobulin J Chain). Comparison between class A (14 samples of NBC) and class B (20 heterogeneous lymphomas, including 9 FL and 11 CLL samples). Hst = Highly stimulated NBC; SSt= Slightly or not stimulated NBC (Table 1). NBC samples are numbered according to Table 1.
Figure 4
Figure 4
Not proper ROC curve corresponding to the expression of gene n. 2 in Table 2 (GENE3390X: Immunoglobulin J Chain). Comparison between class A (14 samples of NBC) and class B (20 heterogeneous lymphomas, including 9 FL and 11 CLL samples). Hst = Highly stimulated NBC; SSt= Slightly or not stimulated NBC (Table 1). NBC samples are numbered according to Table 1.
Figure 5
Figure 5
Not proper ROC curve corresponding to the expression of gene n. 3 in Table 2 (GENE3388X: Immunoglobulin J Chain). Comparison between class A (14 samples of NBC) and class B (20 heterogeneous lymphomas, including 9 FL and 11 CLL samples). Hst = Highly stimulated NBC; SSt= Slightly or not stimulated NBC (Table 1). NBC samples are numbered according to Table 1.
Figure 6
Figure 6
Not proper ROC curve corresponding to the expression of gene n. 4 in Table 2 (GENE3323X: BCL7A). Comparison between class A (14 samples of NBC) and class B (20 heterogeneous lymphomas, including 9 FL and 11 CLL samples). Hst = Highly stimulated NBC; SSt= Slightly or not stimulated NBC (Table 1). NBC samples are numbered according to Table 1.
Figure 7
Figure 7
Not proper ROC curve corresponding to the expression of gene n. 5 in Table 2 (GENE3407X: Histone deacetylase 3). Comparison between class A (14 samples of NBC) and class B (20 heterogeneous lymphomas, including 9 FL and 11 CLL samples). Hst = Highly stimulated NBC; SSt = Slightly or not stimulated NBC (Table 1). NBC samples are numbered according to Table 1.
Figure 8
Figure 8
Not proper ROC curve corresponding to the expression of gene n. 6 in Table 2 (GENE75X: VRK2 kinase). Comparison between class A (14 samples of NBC) and class B (20 heterogeneous lymphomas, including 9 FL and 11 CLL samples). Hst = Highly stimulated NBC; SSt= Slightly or not stimulated NBC (Table 1). NBC samples are numbered according to Table 1.
Figure 9
Figure 9
Not proper ROC curve corresponding to the expression of gene n. 7 in Table 2 (GENE1141X: MAPKKK5). Comparison between class A (14 samples of NBC) and class B (20 heterogeneous lymphomas, including 9 FL and 11 CLL samples). Hst = Highly stimulated NBC; SSt= Slightly or not stimulated NBC (Table 1). NBC samples are numbered according to Table 1.
Figure 10
Figure 10
Not proper ROC curve corresponding to the expression of gene n. 8 in Table 2 (GENE1817X: BL34). Comparison between class A (14 samples of NBC) and class B (20 heterogeneous lymphomas, including 9 FL and 11 CLL samples). Hst = Highly stimulated NBC; SSt= Slightly or not stimulated NBC (Table 1). NBC samples are numbered according to Table 1.
Figure 11
Figure 11
Not proper ROC curve corresponding to the expression of gene n. 9 in Table 2 (GENE2395X: unknown). Comparison between class A (14 samples of NBC) and class B (20 heterogeneous lymphomas, including 9 FL and 11 CLL samples). Hst = Highly stimulated NBC; SSt= Slightly or not stimulated NBC (Table 1). NBC samples are numbered according to Table 1.
Figure 12
Figure 12
Not proper ROC curve corresponding to the expression of gene n. 10 in Table 2 (GENE2696X: unknown). Comparison between class A (14 samples of NBC) and class B (20 heterogeneous lymphomas, including 9 FL and 11 CLL samples). Hst = Highly stimulated NBC; SSt= Slightly or not stimulated NBC (Table 1). NBC samples are numbered according to Table 1.
Figure 13
Figure 13
Not proper ROC curve corresponding to the expression of gene n. 11 in Table 2 (GENE3521X: Similar to KIAA0050). Comparison between class A (14 samples of NBC) and class B (20 heterogeneous lymphomas, including 9 FL and 11 CLL samples). Hst = Highly stimulated NBC; SSt= Slightly or not stimulated NBC (Table 1). NBC samples are numbered according to Table 1.
Figure 14
Figure 14
Not proper ROC curve corresponding to the expression of gene n. 12 in Table 2 (GENE74X: VRK2 kinase). Comparison between class A (14 samples of NBC) and class B (20 heterogeneous lymphomas, including 9 FL and 11 CLL samples). Hst = Highly stimulated NBC; SSt= Slightly or not stimulated NBC (Table 1). NBC samples are numbered according to Table 1.
Figure 15
Figure 15
Not proper ROC curve corresponding to the expression of gene n. 13 in Table 2 (GENE2287X: MRC OX-2). Comparison between class A (14 samples of NBC) and class B (20 heterogeneous lymphomas, including 9 FL and 11 CLL samples). Hst = Highly stimulated NBC; SSt= Slightly or not stimulated NBC (Table 1). NBC samples are numbered according to Table 1.
Figure 16
Figure 16
Not proper ROC curve corresponding to the expression of gene n. 14 in Table 2 (GENE3541X: Unknown). Comparison between class A (14 samples of NBC) and class B (20 heterogeneous lymphomas, including 9 FL and 11 CLL samples). Hst = Highly stimulated NBC; SSt= Slightly or not stimulated NBC (Table 1). NBC samples are numbered according to Table 1.
Figure 17
Figure 17
Not proper ROC curve corresponding to the expression of gene n. 15 in Table 2 (GENE1362X: Syndecan-2). Comparison between class A (14 samples of NBC) and class B (20 heterogeneous lymphomas, including 9 FL and 11 CLL samples). Hst = Highly stimulated NBC; SSt= Slightly or not stimulated NBC (Table 1). NBC samples are numbered according to Table 1.
Figure 18
Figure 18
Not proper ROC curve corresponding to the expression of gene n. 16 in Table 2 (GENE2673X: Unknown). Comparison between class A (14 samples of NBC) and class B (20 heterogeneous lymphomas, including 9 FL and 11 CLL samples). Hst = Highly stimulated NBC; SSt= Slightly or not stimulated NBC (Table 1). NBC samples are numbered according to Table 1.
Figure 19
Figure 19
False Discovery Rate of ABCR (green line), TNRC (blue line) and AUC (red line) as a function of the mean difference between class, the sample size in each class and the number N of selected genes. Median and interquartile range are displayed. Panel A: N = 5; panel B: N = 20; Panel C: N = 50.
Figure 20
Figure 20
Mean and variance estimates for ABCR and TNRC under the null hypothesis as a function of the number of samples in each class (equal sample size). Each estimate was obtained from 104 random permutations.

Similar articles

Cited by

References

    1. Quackenbush J. Microarray analysis and tumor classification. N Engl J Med. 2006;354:2463–2472. doi: 10.1056/NEJMra042342. - DOI - PubMed
    1. Gusnanto A, Calza S, Pawitan Y. Identification of differentially expressed genes and false discovery rate in microarray studies. Curr Opin Lipidol. 2007;18:187–193. doi: 10.1097/MOL.0b013e3280895d6f. - DOI - PubMed
    1. Dudoit S, Yang YH, Speed TP, Callow MJ. Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Statistica Sinica. 2002;12:111–139.
    1. Jeffery IB, Higgins DG, Culhane AC. Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data. BMC Bioinformatics. 2006;7:359. doi: 10.1186/1471-2105-7-359. - DOI - PMC - PubMed
    1. Pepe MS. The statistical evaluation of medical tests for classification and prediction. Oxford (UK): Oxford University Press; 2003.

Publication types

Substances