Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Nov 22;18(1):513.
doi: 10.1186/s12859-017-1933-0.

Unsupervised gene selection using biological knowledge : application in sample clustering

Affiliations

Unsupervised gene selection using biological knowledge : application in sample clustering

Sudipta Acharya et al. BMC Bioinformatics. .

Abstract

Background: Classification of biological samples of gene expression data is a basic building block in solving several problems in the field of bioinformatics like cancer and other disease diagnosis and making a proper treatment plan. One big challenge in sample classification is handling large dimensional and redundant gene expression data. To reduce the complexity of handling this high dimensional data, gene/feature selection plays a major role.

Results: The current paper explores the use of biological knowledge acquired from Gene Ontology database in selecting the proper subset of genes which can further participate in clustering of samples. The proposed feature selection technique is unsupervised in nature as it does not utilize any class label information in the process of gene selection. At the end, a multi-objective clustering approach is deployed to cluster the available set of samples in the reduced gene space.

Conclusions: Reported results show that consideration of biological knowledge in gene selection technique not only reduces the feature space dimensionality in great extent but also improves the accuracy of sample classification. The obtained reduced gene space is validated using strong biological significance tests. In order to prove the supremacy of our proposed gene selection based sample clustering technique, a thorough comparative analysis has also been performed with state-of-the-art techniques.

Keywords: Feature selection; Gene Ontology (GO); Gene-GO term annotation matrix; Multi-objective clustering; Sample classification.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
Flowchart of the proposed framework
Fig. 2
Fig. 2
Struct IC based gene-GO term annotation matrix representation
Fig. 3
Fig. 3
Cluster profile plot of one cluster (having 156 genes and 17 samples) after performing PAM based clustering on gene-GO term annotation matrix of Yeast dataset
Fig. 4
Fig. 4
Cluster profile plot of one cluster (having 102 genes and 103 samples) after performing PAM based clustering on gene-GO term annotation matrix of Multiple tissue dataset
Fig. 5
Fig. 5
Graphical comparative analysis of AMOSA based sample clustering outcomes with respect to three internal cluster validity indices
Fig. 6
Fig. 6
Graphical comparative analysis of our proposed feature selection based sample clustering technique with other existing techniques

References

    1. de Souto MC, Costa IG, de Araujo DS, Ludermir TB, Schliep A. Clustering cancer gene expression data: a comparative study. BMC Bioinformatics. 2008;9(1):497. doi: 10.1186/1471-2105-9-497. - DOI - PMC - PubMed
    1. Mukhopadhyay A, Maulik U, Bandyopadhyay S. On biclustering of gene expression data. Curr Bioinforma. 2010;5(3):204–16. doi: 10.2174/157489310792006701. - DOI
    1. Xing EP, Jordan MI, Karp RM, et al. proc. of the Eighteenth International Conference on Machine Learning (ICML 2001), Vol. 1. Williamstown: Williams College; 2001. Feature selection for high-dimensional genomic microarray data.
    1. Xiong M, Fang X, Zhao J. Biomarker identification by feature wrappers. Genome Res. 2001;11(11):1878–87. - PMC - PubMed
    1. Blum AL, Langley P. Selection of relevant features and examples in machine learning. Artif Intell. 1997;97(1):245–71. doi: 10.1016/S0004-3702(97)00063-5. - DOI