AUC-RF: a new strategy for genomic profiling with random forest
- PMID: 21996641
- DOI: 10.1159/000330778
AUC-RF: a new strategy for genomic profiling with random forest
Abstract
Objective: Genomic profiling, the use of genetic variants at multiple loci simultaneously for the prediction of disease risk, requires the selection of a set of genetic variants that best predicts disease status. The goal of this work was to provide a new selection algorithm for genomic profiling.
Methods: We propose a new algorithm for genomic profiling based on optimizing the area under the receiver operating characteristic curve (AUC) of the random forest (RF). The proposed strategy implements a backward elimination process based on the initial ranking of variables.
Results and conclusions: We demonstrate the advantage of using the AUC instead of the classification error as a measure of predictive accuracy of RF. In particular, we show that the use of the classification error is especially inappropriate when dealing with unbalanced data sets. The new procedure for variable selection and prediction, namely AUC-RF, is illustrated with data from a bladder cancer study and also with simulated data. The algorithm is publicly available as an R package, named AUCRF, at http://cran.r-project.org/.
Copyright © 2011 S. Karger AG, Basel.
Similar articles
-
GeneSrF and varSelRF: a web-based tool and R package for gene selection and classification using random forest.BMC Bioinformatics. 2007 Sep 3;8:328. doi: 10.1186/1471-2105-8-328. BMC Bioinformatics. 2007. PMID: 17767709 Free PMC article.
-
Statistical geometry based prediction of nonsynonymous SNP functional effects using random forest and neuro-fuzzy classifiers.Proteins. 2008 Jun;71(4):1930-9. doi: 10.1002/prot.21838. Proteins. 2008. PMID: 18186470
-
Tumor classification ranking from microarray data.BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S21. doi: 10.1186/1471-2164-9-S2-S21. BMC Genomics. 2008. PMID: 18831787 Free PMC article.
-
A novel feature selection approach for biomedical data classification.J Biomed Inform. 2010 Feb;43(1):15-23. doi: 10.1016/j.jbi.2009.07.008. Epub 2009 Jul 30. J Biomed Inform. 2010. PMID: 19647098
-
A program for computing the prediction probability and the related receiver operating characteristic graph.Anesth Analg. 2010 Dec;111(6):1416-21. doi: 10.1213/ANE.0b013e3181fb919e. Epub 2010 Nov 8. Anesth Analg. 2010. PMID: 21059744 Review.
Cited by
-
An Integrated Approach for Efficient Multi-Omics Joint Analysis.ACM BCB. 2019 Sep;2019:619-625. doi: 10.1145/3307339.3343476. ACM BCB. 2019. PMID: 31588431 Free PMC article.
-
Microbial Functional Gene Diversity Predicts Groundwater Contamination and Ecosystem Functioning.mBio. 2018 Feb 20;9(1):e02435-17. doi: 10.1128/mBio.02435-17. mBio. 2018. PMID: 29463661 Free PMC article.
-
A taxonomic signature of obesity in a large study of American adults.Sci Rep. 2018 Jun 27;8(1):9749. doi: 10.1038/s41598-018-28126-1. Sci Rep. 2018. PMID: 29950689 Free PMC article.
-
Artificial Intelligence: A Promising Tool in Exploring the Phytomicrobiome in Managing Disease and Promoting Plant Health.Plants (Basel). 2023 Apr 30;12(9):1852. doi: 10.3390/plants12091852. Plants (Basel). 2023. PMID: 37176910 Free PMC article. Review.
-
Mortality Prediction in Cerebral Hemorrhage Patients Using Machine Learning Algorithms in Intensive Care Units.Front Neurol. 2021 Jan 20;11:610531. doi: 10.3389/fneur.2020.610531. eCollection 2020. Front Neurol. 2021. PMID: 33551969 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical