Unsupervised gene selection using biological knowledge : application in sample clustering
- PMID: 29166852
- PMCID: PMC5700545
- DOI: 10.1186/s12859-017-1933-0
Unsupervised gene selection using biological knowledge : application in sample clustering
Abstract
Background: Classification of biological samples of gene expression data is a basic building block in solving several problems in the field of bioinformatics like cancer and other disease diagnosis and making a proper treatment plan. One big challenge in sample classification is handling large dimensional and redundant gene expression data. To reduce the complexity of handling this high dimensional data, gene/feature selection plays a major role.
Results: The current paper explores the use of biological knowledge acquired from Gene Ontology database in selecting the proper subset of genes which can further participate in clustering of samples. The proposed feature selection technique is unsupervised in nature as it does not utilize any class label information in the process of gene selection. At the end, a multi-objective clustering approach is deployed to cluster the available set of samples in the reduced gene space.
Conclusions: Reported results show that consideration of biological knowledge in gene selection technique not only reduces the feature space dimensionality in great extent but also improves the accuracy of sample classification. The obtained reduced gene space is validated using strong biological significance tests. In order to prove the supremacy of our proposed gene selection based sample clustering technique, a thorough comparative analysis has also been performed with state-of-the-art techniques.
Keywords: Feature selection; Gene Ontology (GO); Gene-GO term annotation matrix; Multi-objective clustering; Sample classification.
Conflict of interest statement
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Figures






References
-
- Mukhopadhyay A, Maulik U, Bandyopadhyay S. On biclustering of gene expression data. Curr Bioinforma. 2010;5(3):204–16. doi: 10.2174/157489310792006701. - DOI
-
- Xing EP, Jordan MI, Karp RM, et al. proc. of the Eighteenth International Conference on Machine Learning (ICML 2001), Vol. 1. Williamstown: Williams College; 2001. Feature selection for high-dimensional genomic microarray data.
-
- Blum AL, Langley P. Selection of relevant features and examples in machine learning. Artif Intell. 1997;97(1):245–71. doi: 10.1016/S0004-3702(97)00063-5. - DOI
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical