Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Apr 1;9(4):giaa026.
doi: 10.1093/gigascience/giaa026.

ShinyLearner: A containerized benchmarking tool for machine-learning classification of tabular data

Affiliations

ShinyLearner: A containerized benchmarking tool for machine-learning classification of tabular data

Stephen R Piccolo et al. Gigascience. .

Abstract

Background: Classification algorithms assign observations to groups based on patterns in data. The machine-learning community have developed myriad classification algorithms, which are used in diverse life science research domains. Algorithm choice can affect classification accuracy dramatically, so it is crucial that researchers optimize the choice of which algorithm(s) to apply in a given research domain on the basis of empirical evidence. In benchmark studies, multiple algorithms are applied to multiple datasets, and the researcher examines overall trends. In addition, the researcher may evaluate multiple hyperparameter combinations for each algorithm and use feature selection to reduce data dimensionality. Although software implementations of classification algorithms are widely available, robust benchmark comparisons are difficult to perform when researchers wish to compare algorithms that span multiple software packages. Programming interfaces, data formats, and evaluation procedures differ across software packages; and dependency conflicts may arise during installation.

Findings: To address these challenges, we created ShinyLearner, an open-source project for integrating machine-learning packages into software containers. ShinyLearner provides a uniform interface for performing classification, irrespective of the library that implements each algorithm, thus facilitating benchmark comparisons. In addition, ShinyLearner enables researchers to optimize hyperparameters and select features via nested cross-validation; it tracks all nested operations and generates output files that make these steps transparent. ShinyLearner includes a Web interface to help users more easily construct the commands necessary to perform benchmark comparisons. ShinyLearner is freely available at https://github.com/srp33/ShinyLearner.

Conclusions: This software is a resource to researchers who wish to benchmark multiple classification or feature-selection algorithms on a given dataset. We hope it will serve as example of combining the benefits of software containerization with a user-friendly approach.

Keywords: algorithm optimization; benchmark; classification; feature selection; machine learning; model selection; software containers; supervised learning.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Example ShinyLearner command for performing a benchmark comparison. In this example, the user wishes to place output files in a directory located at/home/user/OutputData. To avoid problems with file permissions, this directory should be created before Docker is executed. The Docker run command builds a container and maps input and output directories from the host operating system to locations within the container (separated by colons). The –user directive indicates that the container should execute using the executing user's permissions. The name of the Docker image and tag name are specified (srp33/shinylearner: version511) as well as the name of a ShinyLearner script that performs nested, Monte Carlo cross-validation (/UserScripts/nestedclassification_montecarlo). The remaining arguments indicate the name of the input data file, a description of the analysis, the number of Monte Carlo iterations, the classification algorithms, etc. ShinyLearner provides documentation on each of these arguments as well as a Web application for building such commands dynamically.
Figure 2:
Figure 2:
Classification performance per dataset (default hyperparameters). We evaluated the predictive performance of 10 classification algorithms on 10 biomedical datasets. These results were generated using default hyperparameters for each algorithm. We measured predictive performance using area under the receiver operating characteristic curve (AUROC) and calculated the median across 5 Monte Carlo iterations. Predictive performance differed considerably across and within the datasets.
Figure 3:
Figure 3:
Sample-level predictions for each algorithm on the Diabetes dataset (default hyperparameters). The Diabetes dataset includes a class variable indicating whether patients received a positive diagnosis. Each panel of this figure shows positive-diagnosis predictions for each classification algorithm. All algorithms except sklearn/decision_tree produced probabilistic predictions. The range and distribution of these predictions differed greatly across the algorithms.
Figure 4:
Figure 4:
Classification performance when optimizing vs not optimizing hyperparameters. We tested 10 classification algorithms on 10 biomedical datasets and used nested cross-validation to select hyperparameters. To evaluate for change in predictive performance, we calculated the percent change in the median AUROC values when using optimized vs default hyperparameters. Most algorithms demonstrated improved classification performance with optimized hyperparameters.
Figure 5:
Figure 5:
Classification performance when performing feature selection vs not performing feature selection. In combination with classification, we performed feature selection via nested cross-validation on 10 biomedical datasets. For each algorithm, we used default hyperparameters. These plots show the percent change in the median AUROC when using vs not using feature selection. Although the effects of feature selection varied across the algorithms, median AUROCs increased in many cases.
Figure 6:
Figure 6:
Performance for each combination of classification and feature selection algorithm. This figure shows classification results for the nested cross-validation folds across each combination of feature selection algorithm and classification algorithm. Averaged across all datasets and classification algorithms, we ranked the feature selection algorithms based on AUROC values attained for nested validation sets. For simplicity and consistency across the datasets, this figure shows only the results when the top 5 features were used. Higher average ranks indicate better classification performance.
Figure 7:
Figure 7:
Median classification performance of feature selection algorithms by number of features. We applied feature selection to each dataset, in combination with each of the 10 classification algorithms. For each algorithm, we selected the top x number of features and averaged across each combination of feature selection and classification algorithm. This figure shows which values of x resulted in the highest AUROC values for each dataset. Different datasets had different quantities of features; this graph only shows results for x values relevant to each dataset. Accordingly, we scaled the AUROC values in each column between 0 and 1 to ensure that the comparisons were consistent across all datasets. Higher values indicate better classification performance. Generally, a larger number of features resulted in better classification performance, but this varied across the datasets.

Similar articles

Cited by

References

    1. Shipp MA, Ross KN, Tamayo P, et al. .. Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med. 2002;8:68–74. - PubMed
    1. Nutt CL, Mani DR, Betensky RA, et al. .. Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Res. 2003;63:1602–7. - PubMed
    1. Yuan Y, Van Allen EM, Omberg L, et al. .. Assessing the clinical utility of cancer genomic and proteomic data across tumor types. Nat Biotechnol. 2014;32:644–52. - PMC - PubMed
    1. Bilal E, Dutkowski J, Guinney J, et al. .. Improving breast cancer survival analysis through competition-based multidimensional modeling. PLoS Comput Biol. 2013;9:e1003047. - PMC - PubMed
    1. Piccolo SR, Andrulis IL, Cohen AL, et al. .. Gene-expression patterns in peripheral blood classify familial breast cancer susceptibility. BMC Med Genomics. 2015;8:72. - PMC - PubMed

Publication types