FLOating-Window Projective Separator (FloWPS): A Data Trimming Tool for Support Vector Machines (SVM) to Improve Robustness of the Classifier
- PMID: 30697229
- PMCID: PMC6341065
- DOI: 10.3389/fgene.2018.00717
FLOating-Window Projective Separator (FloWPS): A Data Trimming Tool for Support Vector Machines (SVM) to Improve Robustness of the Classifier
Abstract
Here, we propose a heuristic technique of data trimming for SVM termed FLOating Window Projective Separator (FloWPS), tailored for personalized predictions based on molecular data. This procedure can operate with high throughput genetic datasets like gene expression or mutation profiles. Its application prevents SVM from extrapolation by excluding non-informative features. FloWPS requires training on the data for the individuals with known clinical outcomes to create a clinically relevant classifier. The genetic profiles linked with the outcomes are broken as usual into the training and validation datasets. The unique property of FloWPS is that irrelevant features in validation dataset that don't have significant number of neighboring hits in the training dataset are removed from further analyses. Next, similarly to the k nearest neighbors (kNN) method, for each point of a validation dataset, FloWPS takes into account only the proximal points of the training dataset. Thus, for every point of a validation dataset, the training dataset is adjusted to form a floating window. FloWPS performance was tested on ten gene expression datasets for 992 cancer patients either responding or not on the different types of chemotherapy. We experimentally confirmed by leave-one-out cross-validation that FloWPS enables to significantly increase quality of a classifier built based on the classical SVM in most of the applications, particularly for polynomial kernels.
Keywords: bioinformatics; gene expression; machine learning; oncology; personalized medicine; support vector machines.
Figures






Similar articles
-
Flexible Data Trimming Improves Performance of Global Machine Learning Methods in Omics-Based Personalized Oncology.Int J Mol Sci. 2020 Jan 22;21(3):713. doi: 10.3390/ijms21030713. Int J Mol Sci. 2020. PMID: 31979006 Free PMC article.
-
High-Throughput Mutation Data Now Complement Transcriptomic Profiling: Advances in Molecular Pathway Activation Analysis Approach in Cancer Biology.Cancer Inform. 2019 Mar 25;18:1176935119838844. doi: 10.1177/1176935119838844. eCollection 2019. Cancer Inform. 2019. PMID: 30936679 Free PMC article.
-
A Transfer-Based Additive LS-SVM Classifier for Handling Missing Data.IEEE Trans Cybern. 2020 Feb;50(2):739-752. doi: 10.1109/TCYB.2018.2872800. Epub 2018 Oct 15. IEEE Trans Cybern. 2020. PMID: 30334775
-
Vicinal support vector classifier using supervised kernel-based clustering.Artif Intell Med. 2014 Mar;60(3):189-96. doi: 10.1016/j.artmed.2014.01.003. Epub 2014 Feb 7. Artif Intell Med. 2014. PMID: 24637294
-
Prediction of human breast and colon cancers from imbalanced data using nearest neighbor and support vector machines.Comput Methods Programs Biomed. 2014 Mar;113(3):792-808. doi: 10.1016/j.cmpb.2014.01.001. Epub 2014 Jan 10. Comput Methods Programs Biomed. 2014. PMID: 24472367
Cited by
-
Algorithmic Annotation of Functional Roles for Components of 3,044 Human Molecular Pathways.Front Genet. 2021 Feb 9;12:617059. doi: 10.3389/fgene.2021.617059. eCollection 2021. Front Genet. 2021. PMID: 33633781 Free PMC article.
-
A Triple-Network Dynamic Connection Study in Alzheimer's Disease.Front Psychiatry. 2022 Apr 4;13:862958. doi: 10.3389/fpsyt.2022.862958. eCollection 2022. Front Psychiatry. 2022. PMID: 35444581 Free PMC article.
-
DNA repair pathway activation features in follicular and papillary thyroid tumors, interrogated using 95 experimental RNA sequencing profiles.Heliyon. 2021 Mar 13;7(3):e06408. doi: 10.1016/j.heliyon.2021.e06408. eCollection 2021 Mar. Heliyon. 2021. PMID: 33748479 Free PMC article.
-
System, Method and Software for Calculation of a Cannabis Drug Efficiency Index for the Reduction of Inflammation.Int J Mol Sci. 2020 Dec 31;22(1):388. doi: 10.3390/ijms22010388. Int J Mol Sci. 2020. PMID: 33396562 Free PMC article.
-
RNA Sequencing in Comparison to Immunohistochemistry for Measuring Cancer Biomarkers in Breast Cancer and Lung Cancer Specimens.Biomedicines. 2020 May 9;8(5):114. doi: 10.3390/biomedicines8050114. Biomedicines. 2020. PMID: 32397474 Free PMC article.
References
-
- Ahmed F., Kumar M., Raghava G. P. S. (2009b). Prediction of polyadenylation signals in human DNA sequences using nucleotide frequencies. In Silico Biol. 9 135–148. - PubMed
-
- Altman N. S. (1992). An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46 175–185. 10.1080/00031305.1992.10475879 - DOI
LinkOut - more resources
Full Text Sources