Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 22;10(4):1275-1283.
doi: 10.1016/j.synbio.2025.07.008. eCollection 2025 Dec.

Machine learning-assisted rational design and evolution of novel signal peptides in Yarrowia lipolytica

Affiliations

Machine learning-assisted rational design and evolution of novel signal peptides in Yarrowia lipolytica

Zizhao Wu et al. Synth Syst Biotechnol. .

Abstract

Microbial proteins hold great promise as sustainable alternatives for future protein sources, and oleaginous yeast Yarrowia lipolytica has emerged as a recognized platform for heterologous protein expression and secretion. N-terminal signal peptides (SPs) are crucial for directing proteins to the secretion pathway, which offers advantages in both academic and industrial protein production. Although some of the innate SPs of Y. lipolytica have been reported, there is a growing need to expand the genetic toolkit of SPs to support the increasing use of Y. lipolytica as a cell factory for overproduction of various secretory proteins. In this study, we employed an efficient evolutionary approach to rapidly evolve the innate SP XPR2-pre by leveraging Gibson assembly with two synthetic overlapping oligos containing high portion of degenerate nucleotides. Using Nanoluc (Nluc) luciferase as a robust reporter, we characterized the intracellular and extracellular enzymatic activity of 447 SP mutants and identified previously undescribed SPs exhibiting superior performance compared to XPR2-pre in Nluc luciferase secretion, with improvements of up to 2.91-fold of enzymatic activity in the supernatant. The generalizability of the top-performing SPs was evaluated using three additional heterologous enzymes (β-galactosidase, α-amylase, and PET hydrolase). Our results confirmed their versatility across different proteins with protein-specific efficiency. Additionally, based on our screening, we also evaluated the performance of different feature engineering strategies and machine learning models in the design and prediction of SP mutants. This study integrated rational design, directed evolution and machine learning to identify novel SPs, expanding the repertoire of signal peptides and benefiting secretory protein overexpression in Y. lipolytica.

Keywords: Directed evolution; Machine learning; Protein secretion; Rational Design; Siganl peptides; Yarrowia lipolytica.

PubMed Disclaimer

Conflict of interest statement

The authors declare that there are no conflicts of interest regarding the publication of this manuscript. The research was conducted independently, and no financial or personal relationships influenced the results and discussions presented in the paper.

Figures

Fig. 1
Fig. 1
Evaluation of Nluc luciferase reporter system for characterizing the secretion capacity of SP. (a) Schematic representation of the Nluc expression cassettes. Positive control (PC): Nluc gene fused seamlessly downstream of the XPR2-pre. Negative control (NC): Nluc expression cassette lacking a signal peptide. (b, c) Specific luciferase activity (sLum, RLU/OD) in the supernatant (b) and cell pellets (c) of PC and NC strains over a 72-h time course. ∗ indicates p < 0.05; NS indicates p > 0.05.
Fig. 2
Fig. 2
Construction and screening of signal peptide library. (a) Synthetic overlapping oligos were pre-annealed and assembled into the HindIII/BamHI digested pYLSC3′-preLib-Nluc vector backbone using Gibson assembly. The homology island is marked by the box and homologous arms to enable Gibson assembly fused to the vector backbone are shown in grey. (b) Statistical distribution of fold change of sLum in supernatant relative to XPR2-pre across the signal peptide library. (c) The top 10 mutant SPs were identified based on luciferase activity in supernatant. The sLum values of mutants in supernatant and pellets were normalized to those of XPR2-pre separately. S/P refers to the ratio of absolute value of sLum in supernatant to that in pellets. (d) Distribution of normalized luciferase activity in supernatant and pellets across the library.
Fig. 3
Fig. 3
Quantitative evaluation of SPs for secretory expression of different heterologous proteins in Y. lipolytica. (a) Secreted β-galactosidase activity of strains expressing lacZ with different SPs, measured using ONPG as a substrate. (b) Secreted α-amylase activity of strains expressing amy with different SPs, quantified based on starch degradation. (c) Secreted PET hydrolase activity of strains expressing PETase with different SPs, assessed by BHET hydrolysis. Signal peptides tested include the native XPR2-pre and four identified candidates (SP207, SP298, SP304, SP387). The empty vector pYLSC3 was used as a negative control (NC).
Fig. 4
Fig. 4
Visualization of three feature engineering methods with the SP library dataset. (a) t-SNE visualization of physicochemical features. (b) t-SNE visualization of handcrafted features. (c) t-SNE visualization of embedded features.
Fig. 5
Fig. 5
Nluc luciferase activity assay of the predicted SPs. (a) SP195 and two mutated sequences predicted to be eSPs. (b) SP273 and two mutated sequences predicted to be eSPs. (c) SP273 and two mutated sequences predicted to be eSPs. ∗ Indicates p < 0.05; ∗∗ indicates p < 0.01; ∗∗∗ indicates p < 0.001; NS indicates p > 0.05.

Similar articles

References

    1. GVR, Recombinant Proteins Market Size . 2021. Share & Trends Analysis Report by Host Cell (Insect Cells, Mammalian), by Application (Research, Therapeutics), by Product & Services, by End-User, by Region, and Segment Forecasts; pp. 2022–2030.
    1. Vieira Gomes A., et al. Comparison of yeasts as hosts for recombinant protein production. Microorganisms. 2018;6(2):38. - PMC - PubMed
    1. Thak E.J., et al. Yeast synthetic biology for designed cell factories producing secretory recombinant proteins. FEMS Yeast Res. 2020;20(2):foaa009. - PubMed
    1. Ma J., et al. Synthetic biology, systems biology, and metabolic engineering of Yarrowia lipolytica toward a sustainable biorefinery platform. J Ind Microbiol Biotechnol. 2020;47(9-10):845-862 - PubMed
    1. López-Trujillo J., et al. Temperature and pH optimization for protease production fermented by Yarrowia lipolytica from agro-industrial waste. Fermentation. 2023;9(9):819.

LinkOut - more resources