. 2025 Feb 24;65(4):1826-1836.

doi: 10.1021/acs.jcim.4c02040. Epub 2025 Feb 5.

Toward Machine Learning Electrospray Ionization Sensitivity Prediction for Semiquantitative Lipidomics in Stem Cells

Alexandria Van Grouw¹, Markace A Rainey¹, Olivia K Reid¹, Molly M Ogle², Samuel G Moore², Johnna S Temenoff³, Facundo M Fernández^{1

2}

Affiliations

¹ School of Chemistry and Biochemistry, Georgia Institute of Technology, 901 Atlanta Drive, Atlanta, Georgia 30332, USA.
² Systems Mass Spectrometry Core, Parker H. Petit Institute for Bioengineering and Bioscience, Georgia Institute of Technology, 315 Ferst Drive NW, Atlanta, Georgia Samuel, USA.
³ The Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, 313 Ferst Drive NW, Atlanta, Georgia 30332, USA.

PMID: 39907635
PMCID: PMC11863365
DOI: 10.1021/acs.jcim.4c02040

Toward Machine Learning Electrospray Ionization Sensitivity Prediction for Semiquantitative Lipidomics in Stem Cells

Alexandria Van Grouw et al. J Chem Inf Model. 2025.

. 2025 Feb 24;65(4):1826-1836.

doi: 10.1021/acs.jcim.4c02040. Epub 2025 Feb 5.

Authors

Alexandria Van Grouw¹, Markace A Rainey¹, Olivia K Reid¹, Molly M Ogle², Samuel G Moore², Johnna S Temenoff³, Facundo M Fernández^{1

2}

Affiliations

¹ School of Chemistry and Biochemistry, Georgia Institute of Technology, 901 Atlanta Drive, Atlanta, Georgia 30332, USA.
² Systems Mass Spectrometry Core, Parker H. Petit Institute for Bioengineering and Bioscience, Georgia Institute of Technology, 315 Ferst Drive NW, Atlanta, Georgia Samuel, USA.
³ The Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, 313 Ferst Drive NW, Atlanta, Georgia 30332, USA.

PMID: 39907635
PMCID: PMC11863365
DOI: 10.1021/acs.jcim.4c02040

Abstract

Specificity, sensitivity, and high metabolite coverage make mass spectrometry (MS) one of the most valuable tools in metabolomics and lipidomics. However, translation of metabolomics MS methods to multiyear studies conducted across multiple batches is limited by variability in electrospray ionization response, making batch-to-batch comparisons challenging. This limitation creates an artificial divide between nontargeted discovery work that is broad in scope but limited in terms of absolute quantitation ability and targeted work that is highly accurate but limited in scope due to the need for matched isotopically labeled standards. These issues are often observed in stem cell studies using metabolomic and lipidomic MS approaches, where patient recruitment can be a years-long process and samples become available in discrete batches every few months. To bridge this gap, we developed a machine learning model that predicts electrospray ionization sensitivity for lipid classes that have shown correlation with stem cell potency. Molecular descriptors derived from these lipids' chemical structures are used as model input to predict electrospray response, enabling quantitation by MS with moderate accuracy (semiquantitation). Model performance was evaluated via internal and external validation using cultured cells from various stem cell donors, achieving global percent errors of 40% and 20% for positive and negative electrospray ion modes, respectively. Although this accuracy is typically insufficient for traditional targeted lipidomics experiments, it is sufficient for semiquantitative estimation of lipid marker concentrations across batches without the need for specific chemical standards that many times are unavailable. Furthermore, the precision for model-predicted concentrations was 16.9% for the positive mode and 7.5% for the negative mode, indicating promise for data harmonization across batches. The set of molecular descriptors used by the models described here was able to yield higher accuracy than those previously published in the literature, showing high promise toward semiquantitative lipidomics.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

**Figure 1**
Collection of training and evaluation data sets for machine learning models. Two lipid standard mixes, the Avanti UltimateSPLASH ONE and Avanti SPLASH LIPIDOMIX, were spiked according to the (a) scheme into cell pellet samples composed of MSCs from two donors. From these samples, several lipid (b) data sets were created for model training, testing, and target evaluation. UI27RMEWO3 Created in BioRender. Fernandez, F. (2025).

**Figure 2**
Model development workflow depicting (a) generation of lipid calibration curves for determination of experimental sensitivity values (m_R), (b) identifying optimal train/test split percentage, (c) optimization of data transformation and feature selection, d) generation of 1,101 model iterations to select median error model, (e) model training with 5-fold cross validation, (f) model validation on test set lipids, and (g) external validation on separate lipids in separate samples. Full workflow results in two final models: one for each ionization mode, positive and negative. NA27RMF9PW Created in BioRender. Fernandez, F. (2025).

**Figure 3**
Electrospray ionization efficiency violin plots of observed lipid classes in (a) positive ion mode and (b) negative ion mode. The plot displays the ionization efficiencies of the lipid components of the Avanti Ultimate SplashOne mix spiked into donor 1 and donor 2 MSC samples. For each individual lipid, the ionization efficiencies measured were averaged across donor 1 and donor 2 samples. Ionization efficiency was defined as the slope of the concentration–response curve (m) using non-normalized peak areas.

**Figure 4**
Model calibration and internal validation performance. Plots depicting measured relative slopes (m_R) for UltimateSplash components in donor 1 samples against their predicted relative slopes for (a) positive ion mode model calibration, (b) negative ion mode calibration, (c) positive model cross-validation, (d) negative mode cross-validation, (e) positive mode internal validation, and (f) negative mode internal validation. For positive and negative ion mode model iterations, the median was chosen for display. All models utilized Box-Cox transformation and recursive feature elimination (RFE). Dashed lines indicate the regression fit line, and solid lines indicate a x = y fit.

**Figure 5**
Errors for internal and external validation of positive and negative ion mode models. Average percent errors for (a) positive and (b) negative mode concentration predictions during internal validation using 20% of the analytes in UltimateSplash Mix. Errors were averaged at each calibration level. Average percent errors for (c) positive and (d) negative mode concentration predictions during external validation using Splash Lipidomix in donor 1 and donor 2 samples. Errors were averaged across three high samples and three low samples. Errors were calculated by comparing predicted concentrations to spiked concentrations. Note: ceramide names abbreviated for space; full names can be found in Table S3.

**Figure 6**
Model accuracy compared to traditional nontargeted quantitation approaches. The accuracies of predicted and calculated concentrations for Splash Lipidomix standards in donor 1 and donor 2 samples (averaged across 12 samples). Concentrations were determined using the prediction workflow, one-point calibration, and surrogate lipid class curve calibration and are displayed as (a) a global average for positive and negative modes, (b) averages by lipid component in positive mode, and (c) averages by lipid component in negative mode.

See this image and copyright information in PMC

Cited by

Simultaneous Detection of Polar and Nonpolar Molecules by Nano-ESI MS with Plasma Ignited by an Ozone Generator Power Supply.
Tian Y, Meng Y, Zare RN. Tian Y, et al. Molecules. 2025 Jun 11;30(12):2546. doi: 10.3390/molecules30122546. Molecules. 2025. PMID: 40572511 Free PMC article.

References

1. Buzhor E.; Leshansky L.; Blumenthal J.; Barash H.; Warshawsky D.; Mazor Y.; Shtrichman R. Cell-Based Therapy Approaches: The Hope for Incurable Diseases. Regen. Med. 2014, 9 (5), 649–672. 10.2217/rme.14.35. - DOI - PubMed
1. Song N.; Scholtemeijer M.; Shah K. Mesenchymal Stem Cell Immunomodulation: Mechanisms and Therapeutic Potential. Trends Pharmacol. Sci. 2020, 41 (9), 653–664. 10.1016/j.tips.2020.06.009. - DOI - PMC - PubMed
1. Yamanaka S. Pluripotent Stem Cell-Based Cell Therapy—Promise and Challenges. Cell Stem Cell 2020, 27 (4), 523–531. 10.1016/j.stem.2020.09.014. - DOI - PubMed
1. Brachtl G.; Poupardin R.; Hochmann S.; Raninger A.; Jürchott K.; Streitz M.; Schlickeiser S.; Oeller M.; Wolf M.; Schallmoser K.; Volk H.-D.; Geissler S.; Strunk D. Batch Effects during Human Bone Marrow Stromal Cell Propagation Prevail Donor Variation and Culture Duration: Impact on Genotype, Phenotype and Function. Cells 2022, 11 (6), 946.10.3390/cells11060946. - DOI - PMC - PubMed
1. Galipeau J.; Sensébé L. Mesenchymal Stromal Cells: Clinical Challenges and Therapeutic Opportunities. Cell Stem Cell 2018, 22 (6), 824–833. 10.1016/j.stem.2018.05.004. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

R01 CA218664/CA/NCI NIH HHS/United States

LinkOut - more resources

Full Text Sources
- American Chemical Society
- PubMed Central
Medical
- MedlinePlus Health Information
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Toward Machine Learning Electrospray Ionization Sensitivity Prediction for Semiquantitative Lipidomics in Stem Cells

Affiliations

Toward Machine Learning Electrospray Ionization Sensitivity Prediction for Semiquantitative Lipidomics in Stem Cells

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Medical

Miscellaneous

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Medical

Miscellaneous