. 2020 Sep 17;21(Suppl 13):386.

doi: 10.1186/s12859-020-03681-5.

A consensus multi-view multi-objective gene selection approach for improved sample classification

Sudipta Acharya¹, Laizhong Cui², Yi Pan³

Affiliations

¹ Big Data Institute, College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, PR China.
² Big Data Institute, College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, PR China. cuilz@szu.edu.cn.
³ Department of Computer Science, Georgia State University, Atlanta, USA.

PMID: 32938388
PMCID: PMC7495900
DOI: 10.1186/s12859-020-03681-5

A consensus multi-view multi-objective gene selection approach for improved sample classification

Sudipta Acharya et al. BMC Bioinformatics. 2020.

. 2020 Sep 17;21(Suppl 13):386.

doi: 10.1186/s12859-020-03681-5.

Authors

Sudipta Acharya¹, Laizhong Cui², Yi Pan³

Affiliations

¹ Big Data Institute, College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, PR China.
² Big Data Institute, College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, PR China. cuilz@szu.edu.cn.
³ Department of Computer Science, Georgia State University, Atlanta, USA.

PMID: 32938388
PMCID: PMC7495900
DOI: 10.1186/s12859-020-03681-5

Abstract

Background: In the field of computational biology, analyzing complex data helps to extract relevant biological information. Sample classification of gene expression data is one such popular bio-data analysis technique. However, the presence of a large number of irrelevant/redundant genes in expression data makes a sample classification algorithm working inefficiently. Feature selection is one such high-dimensionality reduction technique that helps to maximize the effectiveness of any sample classification algorithm. Recent advances in biotechnology have improved the biological data to include multi-modal or multiple views. Different 'omics' resources capture various equally important biological properties of entities. However, most of the existing feature selection methodologies are biased towards considering only one out of multiple biological resources. Consequently, some crucial aspects of available biological knowledge may get ignored, which could further improve feature selection efficiency.

Results: In this present work, we have proposed a Consensus Multi-View Multi-objective Clustering-based feature selection algorithm called CMVMC. Three controlled genomic and proteomic resources like gene expression, Gene Ontology (GO), and protein-protein interaction network (PPIN) are utilized to build two independent views. The concept of multi-objective consensus clustering has been applied within our proposed gene selection method to satisfy both incorporated views. Gene expression data sets of Multiple tissues and Yeast from two different organisms (Homo Sapiens and Saccharomyces cerevisiae, respectively) are chosen for experimental purposes. As the end-product of CMVMC, a reduced set of relevant and non-redundant genes are found for each chosen data set. These genes finally participate in an effective sample classification.

Conclusions: The experimental study on chosen data sets shows that our proposed feature-selection method improves the sample classification accuracy and reduces the gene-space up to a significant level. In the case of Multiple Tissues data set, CMVMC reduces the number of genes (features) from 5565 to 41, with 92.73% of sample classification accuracy. For Yeast data set, the number of genes got reduced to 10 from 2884, with 95.84% sample classification accuracy. Two internal cluster validity indices - Silhouette and Davies-Bouldin (DB) and one external validity index Classification Accuracy (CA) are chosen for comparative study. Reported results are further validated through well-known biological significance test and visualization tool.

Keywords: Feature selection; Gene ontology (GO); Multi-objective optimization; Multi-view clustering; Protein protein interaction network; Sample classification.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

**Fig. 1**
Two views developed based on multiple *‘omics’* data

**Fig. 2**
The flowchart of proposed CMVMC-based gene selection algorithm

**Fig. 3**
Structure of each *parent* clustering solution in proposed CMVMC

**Fig. 4**
Formation of consensus clusters of *view 1* and *view 2*

**Fig. 5**
Cluster-profile plot for one random gene cluster from *Multiple tissues* (131 genes and 103 samples) and *Yeast* (180 genes and 17 samples) data set

**Fig. 6**
The comparative Silhouette and DB values for obtained sample clustering solutions for both data sets

**Fig. 7**
The comparative Classification Accuracy (CA) of samples by proposed and existing gene selection approaches

See this image and copyright information in PMC

References

1. Chandra B, Gupta M. An efficient statistical feature selection approach for classification of gene expression data. J Biomed Inform. 2011;44(4):529–35. doi: 10.1016/j.jbi.2011.01.001. - DOI - PubMed
1. Gunavathi C, Premalatha K. Performance analysis of genetic algorithm with kNN and SVM for feature selection in tumor classification. Int J Comput Electr Autom Control Inform Eng. 2014;8(8):1490–7.
1. Mitra S, Ghosh S. Feature selection and clustering of gene expression profiles using biological knowledge. IEEE Trans Syst Man Cybern Part C Appl Rev. 2012;42(6):1590–9. doi: 10.1109/TSMCC.2012.2209416. - DOI
1. Mudiyanselage TKB, Xiao X, Zhang Y, Pan Y. Deep fuzzy neural networks for biomarker selection for accurate cancer detection. IEEE Trans Fuzzy Syst. 2019.
1. Acharya S, Saha S, Nikhil N. Unsupervised gene selection using biological knowledge: application in sample clustering. BMC Bioinformatics. 2017;18(1):513. doi: 10.1186/s12859-017-1933-0. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Molecular Biology Databases
- Saccharomyces Genome Database
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A consensus multi-view multi-objective gene selection approach for improved sample classification

Affiliations

A consensus multi-view multi-objective gene selection approach for improved sample classification

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources

Molecular Biology Databases

Research Materials

Miscellaneous