Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jan 8;2(1):100178.
doi: 10.1016/j.patter.2020.100178.

SIMON: Open-Source Knowledge Discovery Platform

Affiliations

SIMON: Open-Source Knowledge Discovery Platform

Adriana Tomic et al. Patterns (N Y). .

Abstract

Data analysis and knowledge discovery has become more and more important in biology and medicine with the increasing complexity of biological datasets, but the necessarily sophisticated programming skills and in-depth understanding of algorithms needed pose barriers to most biologists and clinicians to perform such research. We have developed a modular open-source software, SIMON, to facilitate the application of 180+ state-of-the-art machine-learning algorithms to high-dimensional biomedical data. With an easy-to-use graphical user interface, standardized pipelines, and automated approach for machine learning and other statistical analysis methods, SIMON helps to identify optimal algorithms and provides a resource that empowers non-technical and technical researchers to identify crucial patterns in biomedical data.

Keywords: artificial intelligence; autoML; bioinformatics; computational biology; data mining; data science; machine learning; software; systems biology.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

None
Graphical abstract
Figure 1
Figure 1
SIMON Machine Learning Workflow Step 1. Building predictive models. (A) Screenshot of the SIMON graphical user interface demonstrating input selection for machine learning analysis, such as predictors and response (outcome) variables, additional exploration classes, training/test split, pre-processing functions, and desired machine learning algorithms. Step 2. Model evaluation and selection. Comparison of (B) box plots of performance measurements calculated for 11 predictive models and (C) receiver operating characteristic (ROC) curves built on the SISA dataset. Each boxplot shows the distribution of data as minimum (Q1−1.5×IQR), first quartile (Q1), median (Q2), third quartile (Q3), and maximum (Q3+1.5×IQR). Data outside of minimum and maximum values (outliers) are shown as circles. IQR, interquartile range.Comparison of ROC curves calculated from the training (average value calculated using 10-fold cross-validation repeated three times) and test sets on (D) datasets with missing values (Cyclists and VAST) and (E) high-dimensional datasets (Zeller and LIHC). Step 3. Feature selection. (F) The variable importance score table for each feature and graphical visualization of the selected features from the Cyclists dataset. Step 4. Exploratory analysis. (G) Correlation analysis on the Cyclists dataset. (H) Clustering analysis on the VAST dataset.

References

    1. Stuart T., Satija R. Integrative single-cell analysis. Nat. Rev. Genet. 2019;20:257–272. - PubMed
    1. Nolan J.P., Condello D. Spectral flow cytometry. Curr. Protoc. Cytom. 2013;1:27. - PMC - PubMed
    1. Gregori G., Patsekin V., Rajwa B., Jones J., Ragheb K., Holdman C., Robinson J.P. Hyperspectral cytometry at the single-cell level using a 32-channel photodetector. Cytometry A. 2012;81:35–44. - PubMed
    1. Futamura K., Sekino M., Hata A., Ikebuchi R., Nakanishi Y., Egawa G., Kabashima K., Watanabe T., Furuki M., Tomura M. Novel full-spectral flow cytometry with multiple spectrally-adjacent fluorescent proteins and fluorochromes and visualization of in vivo cellular movement. Cytometry A. 2015;87:830–842. - PMC - PubMed
    1. Bandura D.R., Baranov V.I., Ornatsky O.I., Antonov A., Kinach R., Lou X., Pavlov S., Vorobiev S., Dick J.E., Tanner S.D. Mass cytometry: technique for real time single cell multitarget immunoassay based on inductively coupled plasma time-of-flight mass spectrometry. Anal. Chem. 2009;81:6813–6822. - PubMed

LinkOut - more resources