Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Oct 12;7(1):13013.
doi: 10.1038/s41598-017-13259-6.

RIFS: a randomly restarted incremental feature selection algorithm

Affiliations

RIFS: a randomly restarted incremental feature selection algorithm

Yuting Ye et al. Sci Rep. .

Abstract

The advent of big data era has imposed both running time and learning efficiency challenges for the machine learning researchers. Biomedical OMIC research is one of these big data areas and has changed the biomedical research drastically. But the high cost of data production and difficulty in participant recruitment introduce the paradigm of "large p small n" into the biomedical research. Feature selection is usually employed to reduce the high number of biomedical features, so that a stable data-independent classification or regression model may be achieved. This study randomly changes the first element of the widely-used incremental feature selection (IFS) strategy and selects the best feature subset that may be ranked low by the statistical association evaluation algorithms, e.g. t-test. The hypothesis is that two low-ranked features may be orchestrated to achieve a good classification performance. The proposed Randomly re-started Incremental Feature Selection (RIFS) algorithm demonstrates both higher classification accuracy and smaller feature number than the existing algorithms. RIFS also outperforms the existing methylomic diagnosis model for the prostate malignancy with a larger accuracy and a lower number of transcriptomic features.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Figure 1
Figure 1
Experimental setting of this work. 17 datasets were chosen to compare RIFS with three filters and three wrappers, and the classification performances were calculated using 10-fold cross-validations.
Figure 2
Figure 2
Demonstrative examples of RIFS rules and evaluation of the best starting percentage. (a) Two features starting from the rank i = 31 for the dataset ALL2. 4 features starting from the rank i = 443 for the dataset ALL3. (b) Accuracy curve of IFS(757) for the dataset T1D. (c) Accuracy curve of IFS(37) for the dataset Colon. (d) The maximum accuracy is calculated for each of the four datasets, i.e. ALL1/ALL2/ALL3/ALL4, with different percentages of all the features as the starting points.
Figure 3
Figure 3
How many steps are tolerated without performance improvements. The classification performance is measured in mAcc.
Figure 4
Figure 4
The classification performances of RIFS on the 17 transcriptome datasets. The measurement mAcc is used as the vertical axis, and the horizontal axis lists the 17 datasets. The detailed mAcc values are also given on the top of each column.
Figure 5
Figure 5
Performance comparison of RIFS with 3 filters and 3 wrappers. The vertical axis is the performance measurement mAcc and F-score, and the horizontal axis gives the dataset names. Since all the filter algorithms select the same number of features as RIFS, only the numbers of features by RIFS are shown. The last table gives the numbers of features selected by the wrappers algorithms and RIFS. (a) Comparison with 3 filters. (b) Comparison with 3 wrappers.
Figure 6
Figure 6
Dot plot of the two features detected by RIFS. There is only one benign prostatic hyperplasia sample which is very close to the prostate carcinoma ones.

References

    1. Stephens ZD, et al. Big Data: Astronomical or Genomical? PLoS biology. 2015;13:e1002195. doi: 10.1371/journal.pbio.1002195. - DOI - PMC - PubMed
    1. Dai X, Xiang L, Li T, Bai Z. Cancer Hallmarks, Biomarkers and Breast Cancer Molecular Subtypes. Journal of Cancer. 2016;7:1281–1294. doi: 10.7150/jca.13141. - DOI - PMC - PubMed
    1. Selvaraju V, et al. Diabetes, oxidative stress, molecular mechanism, and cardiovascular disease–an overview. Toxicology mechanisms and methods. 2012;22:330–335. doi: 10.3109/15376516.2012.666648. - DOI - PubMed
    1. Atanasovska B, Kumar V, Fu J, Wijmenga C, Hofker MH. GWAS as a Driver of Gene Discovery in Cardiometabolic Diseases. Trends in endocrinology and metabolism: TEM. 2015;26:722–732. doi: 10.1016/j.tem.2015.10.004. - DOI - PubMed
    1. Figueroa JD, et al. Genome-wide interaction study of smoking and bladder cancer risk. Carcinogenesis. 2014;35:1737–1744. doi: 10.1093/carcin/bgu064. - DOI - PMC - PubMed

Publication types

LinkOut - more resources