Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Nov 27;21(1):333.
doi: 10.1186/s12911-021-01696-3.

A graph-based gene selection method for medical diagnosis problems using a many-objective PSO algorithm

Affiliations

A graph-based gene selection method for medical diagnosis problems using a many-objective PSO algorithm

Saeid Azadifar et al. BMC Med Inform Decis Mak. .

Abstract

Background: Gene expression data play an important role in bioinformatics applications. Although there may be a large number of features in such data, they mainly tend to contain only a few samples. This can negatively impact the performance of data mining and machine learning algorithms. One of the most effective approaches to alleviate this problem is to use gene selection methods. The aim of gene selection is to reduce the dimensions (features) of gene expression data leading to eliminating irrelevant and redundant genes.

Methods: This paper presents a hybrid gene selection method based on graph theory and a many-objective particle swarm optimization (PSO) algorithm. To this end, a filter method is first utilized to reduce the initial space of the genes. Then, the gene space is represented as a graph to apply a graph clustering method to group the genes into several clusters. Moreover, the many-objective PSO algorithm is utilized to search an optimal subset of genes according to several criteria, which include classification error, node centrality, specificity, edge centrality, and the number of selected genes. A repair operator is proposed to cover the whole space of the genes and ensure that at least one gene is selected from each cluster. This leads to an increasement in the diversity of the selected genes.

Results: To evaluate the performance of the proposed method, extensive experiments are conducted based on seven datasets and two evaluation measures. In addition, three classifiers-Decision Tree (DT), Support Vector Machine (SVM), and K-Nearest Neighbors (KNN)-are utilized to compare the effectiveness of the proposed gene selection method with other state-of-the-art methods. The results of these experiments demonstrate that our proposed method not only achieves more accurate classification, but also selects fewer genes than other methods.

Conclusion: This study shows that the proposed multi-objective PSO algorithm simultaneously removes irrelevant and redundant features using several different criteria. Also, the use of the clustering algorithm and the repair operator has improved the performance of the proposed method by covering the whole space of the problem.

Keywords: Dimension reduction; Gene clustering; Gene selection; High dimensional; Many-objective PSO; Repair operator.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
The Overview of the proposed method
Fig. 2
Fig. 2
The overall schema of the proposed repair operator for a small example with ten genes. The number of clusters is set to three. The initial set of selected genes is G=G1,G3,G7,G10 that all of them belong to cluster 1 and 2 and no gene has been selected from cluster 3. After performing the repair operator, a gene with the lowest Fisher score (G7) is removed from the initial set of selected genes, and a gene from cluster 3 with the highest Fisher score (G9) is added to the final set of selected genes
Fig. 3
Fig. 3
Average classification accuracy and the number of obtained clusters for different values of the parameter θ based on SVM classifier: a AMLGSE2191 dataset, b Colon dataset, c DLBCL dataset, and d Leukemia dataset
Fig. 4
Fig. 4
Comparison of convergence speed of MaPSOGS, RMA, MaPSOGS (without repair operator), Hybrid BPSO-BBHA, EPSO, RPSW, Geometric PSO, PSOC4.5 and PSO models based on: a the Colon dataset and b the DLBCL dataset

Similar articles

Cited by

References

    1. S. Vanjimalar, D. Ramyachitra, P. Manikandan. A review on feature selection techniques for gene expression data. In: 2018 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC). 2018. p. 1–4.
    1. Saeys Y, Inza I, Larranaga P. A review of feature selection techniques in bioinformatics. Bioinformatics (Oxford, England) 2007;23:2507–2517. - PubMed
    1. George V, Velanganny C. Review on feature selection techniques and the impact of Svm for cancer classification using gene expression profile. Int J Comput Sci Eng Surv. 2011;2:16–27.
    1. Zhu Z, Ong Y-S, Dash M. Markov blanket-embedded genetic algorithm for gene selection. Pattern Recognit. 2007;40(11):3236–3248.
    1. Li S, Wu X, Hu X. Gene selection using genetic algorithm and support vectors machines. Soft Comput. 2008;12(7):693–698.

LinkOut - more resources