Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Apr 24;108(1):308.
doi: 10.1007/s00253-024-13147-w.

Explainable AI for CHO cell culture media optimization and prediction of critical quality attribute

Affiliations

Explainable AI for CHO cell culture media optimization and prediction of critical quality attribute

Neelesh Gangwar et al. Appl Microbiol Biotechnol. .

Abstract

Cell culture media play a critical role in cell growth and propagation by providing a substrate; media components can also modulate the critical quality attributes (CQAs). However, the inherent complexity of the cell culture media makes unraveling the impact of the various media components on cell growth and CQAs non-trivial. In this study, we demonstrate an end-to-end machine learning framework for media component selection and prediction of CQAs. The preliminary dataset for feature selection was generated by performing CHO-GS (-/-) cell culture in media formulations with varying metal ion concentrations. Acidic and basic charge variant composition of the innovator product (24.97 ± 0.54% acidic and 11.41 ± 1.44% basic) was chosen as the target variable to evaluate the media formulations. Pearson's correlation coefficient and random forest-based techniques were used for feature ranking and feature selection for the prediction of acidic and basic charge variants. Furthermore, a global interpretation analysis using SHapley Additive exPlanations was utilized to select optimal features by evaluating the contributions of each feature in the extracted vectors. Finally, the medium combinations were predicted by employing fifteen different regression models and utilizing a grid search and random search cross-validation for hyperparameter optimization. Experimental results demonstrate that Fe and Zn significantly impact the charge variant profile. This study aims to offer insights that are pertinent to both innovators seeking to establish a complete pipeline for media development and optimization and biosimilar-based manufacturers who strive to demonstrate the analytical and functional biosimilarity of their products to the innovator. KEY POINTS: • Developed a framework for optimizing media components and prediction of CQA. • SHAP enhances global interpretability, aiding informed decision-making. • Fifteen regression models were employed to predict medium combinations.

Keywords: Biosimilar; Charge variants; Feature ranking; Feature selection; Machine learning; Media development.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Proposed machine learning framework for prediction of critical quality attributes
Fig. 2
Fig. 2
Comparison of charge variant profile of (A) acidic and (B) basic variants with respect to innovator molecule (N = 2) (*p = 0.05, **p = 0.01, ***p = 0.001)
Fig. 3
Fig. 3
Feature ranking for acidic variants (N = 2). a Pearson’s correlation coefficient (PCC), b Gini feature ranking, c waterfall plot (random observation 1), d waterfall plot (random observation 2), e absolute mean SHAP value, and f bee swarm plot
Fig. 4
Fig. 4
Feature selection basic variants (N = 2). a Pearson’s correlation coefficient (PCC), b Gini feature ranking, c waterfall plot (random observation 1), d Waterfall plot (random observation 2), e absolute mean SHAP value, and f bee swarm plot
Fig. 5
Fig. 5
Spearman correlation scatter plots with linear regression (blue line) with its confidence interval (blue area) for both acidic and basic charge variants with correlation coefficient (corr_coef) and p-value
Fig. 6
Fig. 6
Box plots comparing the performance of different machine learning techniques in terms of mean absolute error
Fig. 7
Fig. 7
Prediction with extreme gradient boost regressor: (A) observed (y) vs. predicted (ŷ) (error) plot; (B) residual plot, prediction with random forest; (C) observed (y) vs predicted (ŷ) (error) plot; (D) residual plot
Fig. 8
Fig. 8
Optimized medium cell culture and charge variant profile. A Acidic variants (%). B Basic variants (%). C Viability (upper) and VCC (lower). D Titer (mg/L). E Integral of viable cell density (IVCC). F Specific productivity (qP) (N = 2) (*p = 0.05, **p = 0.01, ***p = 0.001)

Similar articles

Cited by

References

    1. Altmann A, Toloşi L, Sander O, Lengauer T (2010) Permutation importance: a corrected feature importance measure. Bioinformatics 26:1340–1347. 10.1093/BIOINFORMATICS/BTQ134 - PubMed
    1. Babajide Mustapha I, Saeed F (2016) Bioactive molecule prediction using extreme gradient boosting. Molecules (basel, Switzerland) 21(8):983. 10.3390/molecules21080983 - PMC - PubMed
    1. Basu V (2020) Prediction of stellar age with the help of extra-trees regressor in machine learning. Proceedings of the International Conference on Innovative Computing and Communications (ICICC) 2020, Available at SSRN: https://ssrn.com/abstract=3563397 or 10.2139/ssrn.3563397
    1. Breiman L (2001) Random forests. Mach Learn 45:5–32. 10.1023/A:1010933404324
    1. Chicco D, Jurman G (2020) Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone. BMC Med Inform Decis Mak 20. 10.1186/s12911-020-1023-5 - PMC - PubMed

LinkOut - more resources