Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jan 2:13:giad111.
doi: 10.1093/gigascience/giad111.

Machine Learning Made Easy (MLme): a comprehensive toolkit for machine learning-driven data analysis

Affiliations

Machine Learning Made Easy (MLme): a comprehensive toolkit for machine learning-driven data analysis

Akshay Akshay et al. Gigascience. .

Abstract

Background: Machine learning (ML) has emerged as a vital asset for researchers to analyze and extract valuable information from complex datasets. However, developing an effective and robust ML pipeline can present a real challenge, demanding considerable time and effort, thereby impeding research progress. Existing tools in this landscape require a profound understanding of ML principles and programming skills. Furthermore, users are required to engage in the comprehensive configuration of their ML pipeline to obtain optimal performance.

Results: To address these challenges, we have developed a novel tool called Machine Learning Made Easy (MLme) that streamlines the use of ML in research, specifically focusing on classification problems at present. By integrating 4 essential functionalities-namely, Data Exploration, AutoML, CustomML, and Visualization-MLme fulfills the diverse requirements of researchers while eliminating the need for extensive coding efforts. To demonstrate the applicability of MLme, we conducted rigorous testing on 6 distinct datasets, each presenting unique characteristics and challenges. Our results consistently showed promising performance across different datasets, reaffirming the versatility and effectiveness of the tool. Additionally, by utilizing MLme's feature selection functionality, we successfully identified significant markers for CD8+ naive (BACH2), CD16+ (CD16), and CD14+ (VCAN) cell populations.

Conclusion: MLme serves as a valuable resource for leveraging ML to facilitate insightful data analysis and enhance research outcomes, while alleviating concerns related to complex coding scripts. The source code and a detailed tutorial for MLme are available at https://github.com/FunctionalUrology/MLme.

Keywords: AutoML; classification problems; data analysis; machine learning; visualization.

PubMed Disclaimer

Conflict of interest statement

The authors declare they have no competing interests.

Figures

Figure 1:
Figure 1:
Graphical abstract. The input data for Machine Learning Made Easy (MLme) is a file with samples as rows and features as columns, with sample names in the first column and target classes in the last column. MLme provides various features to enhance usability. The data exploration feature enables users to explore the data and gain initial insights. For advanced users, the custom ML feature allows the creation of custom ML pipelines. Upon execution, MLme generates a compressed zip file containing inputParameter.pkl, script.py, and README.txt. Alternatively, users can opt for the AutoML feature, which applies a default ML pipeline to the input file. Both CustomML and AutoML produce a results.pkl file, which can be further analyzed using the visualization feature.
Figure 2:
Figure 2:
Default ML Pipeline for AutoML. The default ML pipeline can be represented as a flowchart that starts by splitting the input dataset into training and independent test sets, provided the user has activated the test set option. Otherwise, the entire dataset is used for training. In the subsequent step, the training dataset is divided into n bins of equal size through stratified sampling. From these bins, k – 1 are designated as training sets while the remainder becomes the test set. In the preprocessing step, low variance features are removed first, followed by data scaling and resampling. Subsequently, the SelectPercentile univariate feature selection method is applied to select important features, and 5 ML classification algorithms are trained. Model performance is assessed on the test set using 3 different methods, and multiple performance metrics are computed. This entire process is repeated for each unique bin in the k-fold corss validation (CV) method. The pipeline outputs a zip file comprising the log .txt and the results.pkl files. The user can examine the results by visualizing the contents of the pickle file using MLme.
Figure 3:
Figure 3:
Identification of potential markers for CD8+ naive, CD16+, and CD14+ cell populations in the PBMC dataset. (A) Heatmap visualization showing the expression patterns of 50 genes selected by MLme. (BD) Expression levels of key markers specific to CD8+ naive, CD16+, and CD14+ cell populations, respectively, within each cell type.

Update of

Similar articles

Cited by

References

    1. Lewis JE, Kemp ML. Integration of machine learning and genome-scale metabolic modeling identifies multi-omics biomarkers for radiation resistance. Nat Commun. 2021;12:2700. 10.1038/s41467-021-22989-1. - DOI - PMC - PubMed
    1. Tollenaar V, Zekollari H, Lhermitte S, et al. Unexplored Antarctic meteorite collection sites revealed through machine learning. Sci Adv. 2022;8:eabj8138. 10.1126/sciadv.abj8138. - DOI - PMC - PubMed
    1. Su Q, Liu Q, Lau RI, et al. Faecal microbiome-based machine learning for multi-class disease diagnosis. Nat Commun. 2022;13:6818. 10.1038/s41467-022-34405-3. - DOI - PMC - PubMed
    1. Martínez BA, Shrotri S, Kingsmore KM, et al. Machine learning reveals distinct gene signature profiles in lesional and nonlesional regions of inflammatory skin diseases. Sci Adv. 2022;8:eabn4776. 10.1126/sciadv.abn4776. - DOI - PMC - PubMed
    1. Chen Z, Ma W, Li Y, et al. Using machine learning to estimate the incidence rate of intimate partner violence. Sci Rep. 2023;13:5533. 10.1038/s41598-023-31846-8. - DOI - PMC - PubMed

Publication types