Supervised classification of array CGH data with HMM-based feature selection
- PMID: 19209723
Supervised classification of array CGH data with HMM-based feature selection
Abstract
Motivation: For different tumour types, extended knowledge about the molecular mechanisms involved in tumorigenesis is lacking. Looking for copy number variations (CNV) by Comparative Genomic Hybridization (CGH) can help however to determine key elements in this tumorigenesis. As genome-wide array CGH gives the opportunity to evaluate CNV at high resolution, this leads to huge amount of data, necessitating adequate mathematical methods to carefully select and interpret these data.
Results: Two groups of patients differing in cancer subtype were defined in two publicly available array CGH data sets as well as in our own data set on ovarian cancer. Chromosomal regions characterizing each group of patients were gathered using recurrent hidden Markov Models (HMM). The differential regions were reduced to a subset of features for classification by integrating different univariate feature selection methods. Weighted Least Squares Support Vector Machines (LS-SVM), a supervised classification method which takes unbalancedness of data sets into account, resulted in leave-one-out or 10-fold cross-validation accuracies ranging from 88 to 95.5%.
Conclusion: The combination of recurrent HMMs for the detection of copy number alterations with LS-SVM classifiers offers a novel methodological approach for classification based on copy number alterations. Additionally, this approach limits the chromosomal regions that are necessary to classify patients according to cancer subtype.
Similar articles
-
Classification of array CGH data using smoothed logistic regression model.Stat Med. 2009 Dec 30;28(30):3798-810. doi: 10.1002/sim.3753. Stat Med. 2009. PMID: 19856275
-
Machine learning models for lung cancer classification using array comparative genomic hybridization.Proc AMIA Symp. 2002:7-11. Proc AMIA Symp. 2002. PMID: 12463776 Free PMC article.
-
Combined array-comparative genomic hybridization and single-nucleotide polymorphism-loss of heterozygosity analysis reveals complex genetic alterations in cervical cancer.BMC Genomics. 2007 Feb 20;8:53. doi: 10.1186/1471-2164-8-53. BMC Genomics. 2007. PMID: 17311676 Free PMC article.
-
CGH microarrays and cancer.Curr Opin Biotechnol. 2008 Feb;19(1):36-40. doi: 10.1016/j.copbio.2007.11.004. Epub 2007 Dec 26. Curr Opin Biotechnol. 2008. PMID: 18162393 Review.
-
[Analysis of genomic copy number alterations of malignant lymphomas and its application for diagnosis].Gan To Kagaku Ryoho. 2007 Jul;34(7):975-82. Gan To Kagaku Ryoho. 2007. PMID: 17637530 Review. Japanese.
Cited by
-
Machine Learning for Lung Cancer Diagnosis, Treatment, and Prognosis.Genomics Proteomics Bioinformatics. 2022 Oct;20(5):850-866. doi: 10.1016/j.gpb.2022.11.003. Epub 2022 Dec 1. Genomics Proteomics Bioinformatics. 2022. PMID: 36462630 Free PMC article. Review.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Research Materials