. 2011;6(9):e24259.

doi: 10.1371/journal.pone.0024259. Epub 2011 Sep 1.

Identification of single- and multiple-class specific signature genes from gene expression profiles by group marker index

Yu-Shuen Tsai¹, Kripamoy Aguan, Nikhil R Pal, I-Fang Chung

Affiliations

PMID: 21909426
PMCID: PMC3164723
DOI: 10.1371/journal.pone.0024259

Identification of single- and multiple-class specific signature genes from gene expression profiles by group marker index

Yu-Shuen Tsai et al. PLoS One. 2011.

. 2011;6(9):e24259.

doi: 10.1371/journal.pone.0024259. Epub 2011 Sep 1.

Authors

Yu-Shuen Tsai¹, Kripamoy Aguan, Nikhil R Pal, I-Fang Chung

Affiliation

¹ Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan.

PMID: 21909426
PMCID: PMC3164723
DOI: 10.1371/journal.pone.0024259

Abstract

Informative genes from microarray data can be used to construct prediction model and investigate biological mechanisms. Differentially expressed genes, the main targets of most gene selection methods, can be classified as single- and multiple-class specific signature genes. Here, we present a novel gene selection algorithm based on a Group Marker Index (GMI), which is intuitive, of low-computational complexity, and efficient in identification of both types of genes. Most gene selection methods identify only single-class specific signature genes and cannot identify multiple-class specific signature genes easily. Our algorithm can detect de novo certain conditions of multiple-class specificity of a gene and makes use of a novel non-parametric indicator to assess the discrimination ability between classes. Our method is effective even when the sample size is small as well as when the class sizes are significantly different. To compare the effectiveness and robustness we formulate an intuitive template-based method and use four well-known datasets. We demonstrate that our algorithm outperforms the template-based method in difficult cases with unbalanced distribution. Moreover, the multiple-class specific genes are good biomarkers and play important roles in biological pathways. Our literature survey supports that the proposed method identifies unique multiple-class specific marker genes (not reported earlier to be related to cancer) in the Central Nervous System data. It also discovers unique biomarkers indicating the intrinsic difference between subtypes of lung cancer. We also associate the pathway information with the multiple-class specific signature genes and cross-reference to published studies. We find that the identified genes participate in the pathways directly involved in cancer development in leukemia data. Our method gives a promising way to find genes that can involve in pathways of multiple diseases and hence opens up the possibility of using an existing drug on other diseases as well as designing a single drug for multiple diseases.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

**Figure 1. Scatter-plots of the top most gene of each level in the SRBCT data set.**
Panels (a), (b) and (c) are the scatter-plots of the top most gene of level-1, level-2, and level-3, respectively. The top most genes are WAS (236282), PTPN12 (774502) and GSTA4 (504791), respectively. There are four classes in the SRBCT data set: Ewing sarcomas (EWS), Burkitt lymphomas (BL), neuroblastomas (NB), and rhabdomyosarcomas (RMS).

**Figure 2. The level-2-like and level-1-like genes ranked within top 10 level-3 genes by template-based method in the Lung Cancer data set.**
Panels (a), (b), (c) and (d) are the scatter-plots of the level-2-like genes. Panels (e) and (f) are the scatter-plots of the level-1-like genes.

**Figure 3. Effect of sample size on Pearson's correlation coefficient values.**

**Figure 4. Steps involved to compute GMI and to find the list of group specific genes for each level of discrimination.**

**Figure 5. A 5-class synthetic example to illustrate computation of GMI.**
There are four levels of discrimination in the 5-class synthetic data set. Panels (a) to (d) depict the computation of GMI values at each level of discrimination. The dotted lines in each panel indicate the two mean values used for GMI computation in each level of discrimination. All filled samples in each panel indicate the upper group samples. The remaining open samples in each panel indicate the lower group samples.

See this image and copyright information in PMC

Cited by

Discovering monotonic stemness marker genes from time-series stem cell microarray data.
Wang HW, Sun HJ, Chang TY, Lo HH, Cheng WC, Tseng GC, Lin CT, Chang SJ, Pal N, Chung IF. Wang HW, et al. BMC Genomics. 2015;16 Suppl 2(Suppl 2):S2. doi: 10.1186/1471-2164-16-S2-S2. Epub 2015 Jan 21. BMC Genomics. 2015. PMID: 25708300 Free PMC article.
Genetic algorithm based cancerous gene identification from microarray data using ensemble of filter methods.
Ghosh M, Adhikary S, Ghosh KK, Sardar A, Begum S, Sarkar R. Ghosh M, et al. Med Biol Eng Comput. 2019 Jan;57(1):159-176. doi: 10.1007/s11517-018-1874-4. Epub 2018 Aug 1. Med Biol Eng Comput. 2019. PMID: 30069674

References

1. Tsai YS, Lin CT, Tseng GC, Chung IF, Pal NR. Discovery of dominant and dormant genes from expression data using a novel generalization of SNR for multi-class problems. BMC Bioinformatics. 2008;9:425. doi: 10.1186/1471-2105-9-425. - DOI - PMC - PubMed
1. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999;286:531–537. doi: 10.1126/science.286.5439.531. - DOI - PubMed
1. Dudoit S, Fridlyand J, Speed TP. Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc. 2002;97:77–87. doi: 10.1198/016214502753479248. - DOI
1. Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using support vector machines. Mach Learn. 2002;46:389–422.
1. Pal NR, Aguan K, Sharma A, Amari SI. Discovering biomarkers from gene expression data for predicting cancer subgroups using neural networks and relational fuzzy clustering. BMC Bioinformatics. 2007;8:5. doi: 10.1186/1471-2105-8-5. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Identification of single- and multiple-class specific signature genes from gene expression profiles by group marker index

Affiliation

Identification of single- and multiple-class specific signature genes from gene expression profiles by group marker index

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources