ShinyLearner: A containerized benchmarking tool for machine-learning classification of tabular data
- PMID: 32249316
- PMCID: PMC7131989
- DOI: 10.1093/gigascience/giaa026
ShinyLearner: A containerized benchmarking tool for machine-learning classification of tabular data
Abstract
Background: Classification algorithms assign observations to groups based on patterns in data. The machine-learning community have developed myriad classification algorithms, which are used in diverse life science research domains. Algorithm choice can affect classification accuracy dramatically, so it is crucial that researchers optimize the choice of which algorithm(s) to apply in a given research domain on the basis of empirical evidence. In benchmark studies, multiple algorithms are applied to multiple datasets, and the researcher examines overall trends. In addition, the researcher may evaluate multiple hyperparameter combinations for each algorithm and use feature selection to reduce data dimensionality. Although software implementations of classification algorithms are widely available, robust benchmark comparisons are difficult to perform when researchers wish to compare algorithms that span multiple software packages. Programming interfaces, data formats, and evaluation procedures differ across software packages; and dependency conflicts may arise during installation.
Findings: To address these challenges, we created ShinyLearner, an open-source project for integrating machine-learning packages into software containers. ShinyLearner provides a uniform interface for performing classification, irrespective of the library that implements each algorithm, thus facilitating benchmark comparisons. In addition, ShinyLearner enables researchers to optimize hyperparameters and select features via nested cross-validation; it tracks all nested operations and generates output files that make these steps transparent. ShinyLearner includes a Web interface to help users more easily construct the commands necessary to perform benchmark comparisons. ShinyLearner is freely available at https://github.com/srp33/ShinyLearner.
Conclusions: This software is a resource to researchers who wish to benchmark multiple classification or feature-selection algorithms on a given dataset. We hope it will serve as example of combining the benefits of software containerization with a user-friendly approach.
Keywords: algorithm optimization; benchmark; classification; feature selection; machine learning; model selection; software containers; supervised learning.
© The Author(s) 2020. Published by Oxford University Press.
Figures







Similar articles
-
Machine learning algorithms for outcome prediction in (chemo)radiotherapy: An empirical comparison of classifiers.Med Phys. 2018 Jul;45(7):3449-3459. doi: 10.1002/mp.12967. Epub 2018 Jun 13. Med Phys. 2018. PMID: 29763967 Free PMC article.
-
A Comprehensive Machine Learning Benchmark Study for Radiomics-Based Survival Analysis of CT Imaging Data in Patients With Hepatic Metastases of CRC.Invest Radiol. 2023 Dec 1;58(12):874-881. doi: 10.1097/RLI.0000000000001009. Epub 2023 Jul 28. Invest Radiol. 2023. PMID: 37504498 Free PMC article.
-
MOABB: trustworthy algorithm benchmarking for BCIs.J Neural Eng. 2018 Dec;15(6):066011. doi: 10.1088/1741-2552/aadea0. Epub 2018 Sep 4. J Neural Eng. 2018. PMID: 30177583
-
A review of image analysis and machine learning techniques for automated cervical cancer screening from pap-smear images.Comput Methods Programs Biomed. 2018 Oct;164:15-22. doi: 10.1016/j.cmpb.2018.05.034. Epub 2018 Jun 26. Comput Methods Programs Biomed. 2018. PMID: 30195423 Review.
-
Supervised application of internal validation measures to benchmark dimensionality reduction methods in scRNA-seq data.Brief Bioinform. 2021 Nov 5;22(6):bbab304. doi: 10.1093/bib/bbab304. Brief Bioinform. 2021. PMID: 34374742 Review.
Cited by
-
Multispectral Image under Tissue Classification Algorithm in Screening of Cervical Cancer.J Healthc Eng. 2022 Jan 7;2022:9048123. doi: 10.1155/2022/9048123. eCollection 2022. J Healthc Eng. 2022. PMID: 35035863 Free PMC article.
-
Analytical code sharing practices in biomedical research.bioRxiv [Preprint]. 2023 Aug 7:2023.07.31.551384. doi: 10.1101/2023.07.31.551384. bioRxiv. 2023. Update in: PeerJ Comput Sci. 2024 Jun 28;10:e2066. doi: 10.7717/peerj-cs.2066. PMID: 37609176 Free PMC article. Updated. Preprint.
-
ChampKit: A framework for rapid evaluation of deep neural networks for patch-based histopathology classification.Comput Methods Programs Biomed. 2023 Sep;239:107631. doi: 10.1016/j.cmpb.2023.107631. Epub 2023 May 30. Comput Methods Programs Biomed. 2023. PMID: 37271050 Free PMC article.
-
GigaByte: Publishing at the Speed of Research.GigaByte. 2020 Jul 1;2020:gigabyte1. doi: 10.46471/gigabyte.1. eCollection 2020. GigaByte. 2020. PMID: 36824595 Free PMC article.
-
The ability to classify patients based on gene-expression data varies by algorithm and performance metric.PLoS Comput Biol. 2022 Mar 11;18(3):e1009926. doi: 10.1371/journal.pcbi.1009926. eCollection 2022 Mar. PLoS Comput Biol. 2022. PMID: 35275931 Free PMC article.
References
-
- Shipp MA, Ross KN, Tamayo P, et al. .. Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med. 2002;8:68–74. - PubMed
-
- Nutt CL, Mani DR, Betensky RA, et al. .. Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Res. 2003;63:1602–7. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources