. 2020 Apr 1;9(4):giaa026.

doi: 10.1093/gigascience/giaa026.

ShinyLearner: A containerized benchmarking tool for machine-learning classification of tabular data

Stephen R Piccolo¹, Terry J Lee¹, Erica Suh¹, Kimball Hill¹

Affiliations

PMID: 32249316
PMCID: PMC7131989
DOI: 10.1093/gigascience/giaa026

ShinyLearner: A containerized benchmarking tool for machine-learning classification of tabular data

Stephen R Piccolo et al. Gigascience. 2020.

. 2020 Apr 1;9(4):giaa026.

doi: 10.1093/gigascience/giaa026.

Authors

Stephen R Piccolo¹, Terry J Lee¹, Erica Suh¹, Kimball Hill¹

Affiliation

¹ Department of Biology, Brigham Young University, 4102 Life Sciences Building, Provo, UT, 84602, USA.

PMID: 32249316
PMCID: PMC7131989
DOI: 10.1093/gigascience/giaa026

Abstract

Background: Classification algorithms assign observations to groups based on patterns in data. The machine-learning community have developed myriad classification algorithms, which are used in diverse life science research domains. Algorithm choice can affect classification accuracy dramatically, so it is crucial that researchers optimize the choice of which algorithm(s) to apply in a given research domain on the basis of empirical evidence. In benchmark studies, multiple algorithms are applied to multiple datasets, and the researcher examines overall trends. In addition, the researcher may evaluate multiple hyperparameter combinations for each algorithm and use feature selection to reduce data dimensionality. Although software implementations of classification algorithms are widely available, robust benchmark comparisons are difficult to perform when researchers wish to compare algorithms that span multiple software packages. Programming interfaces, data formats, and evaluation procedures differ across software packages; and dependency conflicts may arise during installation.

Findings: To address these challenges, we created ShinyLearner, an open-source project for integrating machine-learning packages into software containers. ShinyLearner provides a uniform interface for performing classification, irrespective of the library that implements each algorithm, thus facilitating benchmark comparisons. In addition, ShinyLearner enables researchers to optimize hyperparameters and select features via nested cross-validation; it tracks all nested operations and generates output files that make these steps transparent. ShinyLearner includes a Web interface to help users more easily construct the commands necessary to perform benchmark comparisons. ShinyLearner is freely available at https://github.com/srp33/ShinyLearner.

Conclusions: This software is a resource to researchers who wish to benchmark multiple classification or feature-selection algorithms on a given dataset. We hope it will serve as example of combining the benefits of software containerization with a user-friendly approach.

Keywords: algorithm optimization; benchmark; classification; feature selection; machine learning; model selection; software containers; supervised learning.

PubMed Disclaimer

Figures

**Figure 1:**
Example ShinyLearner command for performing a benchmark comparison. In this example, the user wishes to place output files in a directory located at/home/user/OutputData. To avoid problems with file permissions, this directory should be created before Docker is executed. The Docker run command builds a container and maps input and output directories from the host operating system to locations within the container (separated by colons). The –user directive indicates that the container should execute using the executing user's permissions. The name of the Docker image and tag name are specified (srp33/shinylearner: version511) as well as the name of a ShinyLearner script that performs nested, Monte Carlo cross-validation (/UserScripts/nestedclassification_montecarlo). The remaining arguments indicate the name of the input data file, a description of the analysis, the number of Monte Carlo iterations, the classification algorithms, etc. ShinyLearner provides documentation on each of these arguments as well as a Web application for building such commands dynamically.

**Figure 2:**
Classification performance per dataset (default hyperparameters). We evaluated the predictive performance of 10 classification algorithms on 10 biomedical datasets. These results were generated using default hyperparameters for each algorithm. We measured predictive performance using area under the receiver operating characteristic curve (AUROC) and calculated the median across 5 Monte Carlo iterations. Predictive performance differed considerably across and within the datasets.

**Figure 3:**
Sample-level predictions for each algorithm on the Diabetes dataset (default hyperparameters). The Diabetes dataset includes a class variable indicating whether patients received a positive diagnosis. Each panel of this figure shows positive-diagnosis predictions for each classification algorithm. All algorithms except sklearn/decision_tree produced probabilistic predictions. The range and distribution of these predictions differed greatly across the algorithms.

**Figure 4:**
Classification performance when optimizing vs not optimizing hyperparameters. We tested 10 classification algorithms on 10 biomedical datasets and used nested cross-validation to select hyperparameters. To evaluate for change in predictive performance, we calculated the percent change in the median AUROC values when using optimized vs default hyperparameters. Most algorithms demonstrated improved classification performance with optimized hyperparameters.

**Figure 5:**
Classification performance when performing feature selection vs not performing feature selection. In combination with classification, we performed feature selection via nested cross-validation on 10 biomedical datasets. For each algorithm, we used default hyperparameters. These plots show the percent change in the median AUROC when using vs not using feature selection. Although the effects of feature selection varied across the algorithms, median AUROCs increased in many cases.

**Figure 6:**
Performance for each combination of classification and feature selection algorithm. This figure shows classification results for the nested cross-validation folds across each combination of feature selection algorithm and classification algorithm. Averaged across all datasets and classification algorithms, we ranked the feature selection algorithms based on AUROC values attained for nested validation sets. For simplicity and consistency across the datasets, this figure shows only the results when the top 5 features were used. Higher average ranks indicate better classification performance.

**Figure 7:**
Median classification performance of feature selection algorithms by number of features. We applied feature selection to each dataset, in combination with each of the 10 classification algorithms. For each algorithm, we selected the top x number of features and averaged across each combination of feature selection and classification algorithm. This figure shows which values of x resulted in the highest AUROC values for each dataset. Different datasets had different quantities of features; this graph only shows results for x values relevant to each dataset. Accordingly, we scaled the AUROC values in each column between 0 and 1 to ensure that the comparisons were consistent across all datasets. Higher values indicate better classification performance. Generally, a larger number of features resulted in better classification performance, but this varied across the datasets.

See this image and copyright information in PMC

Cited by

Multispectral Image under Tissue Classification Algorithm in Screening of Cervical Cancer.
Wang P, Wang S, Zhang Y, Duan X. Wang P, et al. J Healthc Eng. 2022 Jan 7;2022:9048123. doi: 10.1155/2022/9048123. eCollection 2022. J Healthc Eng. 2022. PMID: 35035863 Free PMC article.
Analytical code sharing practices in biomedical research.
Sharma NK, Ayyala R, Deshpande D, Patel YM, Munteanu V, Ciorba D, Fiscutean A, Vahed M, Sarkar A, Guo R, Moore A, Darci-Maher N, Nogoy NA, Abedalthagafi MS, Mangul S. Sharma NK, et al. bioRxiv [Preprint]. 2023 Aug 7:2023.07.31.551384. doi: 10.1101/2023.07.31.551384. bioRxiv. 2023. Update in: PeerJ Comput Sci. 2024 Jun 28;10:e2066. doi: 10.7717/peerj-cs.2066. PMID: 37609176 Free PMC article. Updated. Preprint.
ChampKit: A framework for rapid evaluation of deep neural networks for patch-based histopathology classification.
Kaczmarzyk JR, Gupta R, Kurc TM, Abousamra S, Saltz JH, Koo PK. Kaczmarzyk JR, et al. Comput Methods Programs Biomed. 2023 Sep;239:107631. doi: 10.1016/j.cmpb.2023.107631. Epub 2023 May 30. Comput Methods Programs Biomed. 2023. PMID: 37271050 Free PMC article.
GigaByte: Publishing at the Speed of Research.
Edmunds SC, Goodman L. Edmunds SC, et al. GigaByte. 2020 Jul 1;2020:gigabyte1. doi: 10.46471/gigabyte.1. eCollection 2020. GigaByte. 2020. PMID: 36824595 Free PMC article.
The ability to classify patients based on gene-expression data varies by algorithm and performance metric.
Piccolo SR, Mecham A, Golightly NP, Johnson JL, Miller DB. Piccolo SR, et al. PLoS Comput Biol. 2022 Mar 11;18(3):e1009926. doi: 10.1371/journal.pcbi.1009926. eCollection 2022 Mar. PLoS Comput Biol. 2022. PMID: 35275931 Free PMC article.

References

1. Shipp MA, Ross KN, Tamayo P, et al. .. Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med. 2002;8:68–74. - PubMed
1. Nutt CL, Mani DR, Betensky RA, et al. .. Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Res. 2003;63:1602–7. - PubMed
1. Yuan Y, Van Allen EM, Omberg L, et al. .. Assessing the clinical utility of cancer genomic and proteomic data across tumor types. Nat Biotechnol. 2014;32:644–52. - PMC - PubMed
1. Bilal E, Dutkowski J, Guinney J, et al. .. Improving breast cancer survival analysis through competition-based multidimensional modeling. PLoS Comput Biol. 2013;9:e1003047. - PMC - PubMed
1. Piccolo SR, Andrulis IL, Cohen AL, et al. .. Gene-expression patterns in peripheral blood classify familial breast cancer susceptibility. BMC Med Genomics. 2015;8:72. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

ShinyLearner: A containerized benchmarking tool for machine-learning classification of tabular data

Affiliation

ShinyLearner: A containerized benchmarking tool for machine-learning classification of tabular data

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources