Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Apr 11:12:1379121.
doi: 10.3389/fbioe.2024.1379121. eCollection 2024.

Machine learning model of the catalytic efficiency and substrate specificity of acyl-ACP thioesterase variants generated from natural and in vitro directed evolution

Affiliations

Machine learning model of the catalytic efficiency and substrate specificity of acyl-ACP thioesterase variants generated from natural and in vitro directed evolution

Fuyuan Jing et al. Front Bioeng Biotechnol. .

Abstract

Modulating the catalytic activity of acyl-ACP thioesterase (TE) is an important biotechnological target for effectively increasing flux and diversifying products of the fatty acid biosynthesis pathway. In this study, a directed evolution approach was developed to improve the fatty acid titer and fatty acid diversity produced by E. coli strains expressing variant acyl-ACP TEs. A single round of in vitro directed evolution, coupled with a high-throughput colorimetric screen, identified 26 novel acyl-ACP TE variants that convey up to a 10-fold increase in fatty acid titer, and generate altered fatty acid profiles when expressed in a bacterial host strain. These in vitro-generated variant acyl-ACP TEs, in combination with 31 previously characterized natural variants isolated from diverse phylogenetic origins, were analyzed with a random forest classifier machine learning tool. The resulting quantitative model identified 22 amino acid residues, which define important structural features that determine the catalytic efficiency and substrate specificity of acyl-ACP TE.

Keywords: Thioesterase; acyl-ACP; directed evolution; fatty acids; machine learning; random forest.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The author(s) declared that they were an editorial board member of Frontiers, at the time of submission. This had no impact on the peer review process and the final decision.

Figures

FIGURE 1
FIGURE 1
Efficacy of the Neutral Red plate screening assay. (A) Colonies expressing acyl-ACP TE variants were grown at 30°C for 3 days on Petri plates with media supplemented with Neutral Red dye. The colonies displaying a more intense red color are indicated by arrows. (B) Box-and-whisker plot of fatty acid titer of cultures that were inoculated from “dark-red” (n = 177) and “light-red” (n = 77) colonies. t-test p-value <0.01.
FIGURE 2
FIGURE 2
Fatty acid titers of six parental acyl-ACP TEs (green data-bars) and representative acyl-ACP TE variants (blue data-bars and red diamond data-points). Data-bars represent fatty acid titer data and are presented as µmol/L (data bars) and as mg/L (red-diamond data-points).
FIGURE 3
FIGURE 3
Fatty acid titers and fatty acid specificity of evolved acyl-ACP TE variants. (A) Dendrogram representation of sequence similarities among acyl-ACP TE variants. The dendrogram was inferred using the Minimum Evolution method (Rzhetsky and Nei, 1993). The bootstrap consensus tree (bootstrap value identified at each node), which was inferred from 250 replicates, represents the evolutionary history of each acyl-ACP TE. (B) Fatty acid profiles of 26 unique acyl-ACP TE variants generated in this study and compared to the six parental acyl-ACP TEs used to constrain the directed evolution strategy. The intensity of the green shading of each cell is proportional to the mol% of each fatty acid. a Among the 175 acyl-ACP TE variants recovered in this study, the TEGm2198, TEGm204 and TEGm162 variants recurred 2, 3, and 147 times, respectively. b Acyl-ACP TEs can be classified into three groups based on their substrate specificity: Class I enzymes primarily hydrolyze acyl-ACPs of 14- and 16-carbon acyl-chains, Class II enzymes prefer 8- to 16-carbon acyl-chains, and Class III enzymes have a preference for 8-carbon acyl-chains.
FIGURE 4
FIGURE 4
Categorizing acyl-ACP TEs based on fatty acid profiles. (A) Enzyme cluster membership was determined by hierarchical clustering of fatty acid profiles produced when each acyl-ACP TE was expressed in E. coli. (B) The PCA plot based on the fatty acid profiles produced when each acyl-ACP TE was expressed in E. coli. PC1 and PC2 together explain 59% of the data variation, and segregate the 57 enzymes into three clusters demonstrated by 95% confidence ellipses. (C) The fatty acid profiles produced by the acyl-ACP TEs that belong to Cluster A (as defined in panels (A) and (B)). (D) The fatty acid profiles produced by the acyl-ACP TEs that belong to Cluster B (as defined in panels (A) and (B)). (E) The fatty acid profiles produced by the acyl-ACP TEs that belong to Cluster C (as defined in panels (A) and (B)).
FIGURE 5
FIGURE 5
Identification of residue positions predicted to govern acyl-ACP TE substrate specificity. (A) The importance scores for each residue position were generated by the random forest model that uses all 350 positions and one random variable as the predictors. The most impactful positions that determine the substrate specificity of the enzyme (orange-colored data points) were identified via Incremental Feature Selection (IFS) and have q-values <0.001. Non-significant positions are in black. (B) IFS selects the most important predictor set by evaluating the predictive performance of the associated model, as demonstrated by recall, specificity, and MCC. (C) A zoom-in view of the predictive performance evaluated by IFS. MCC hits the plateau when the top 22 residue positions (highlighted by filled circles) are included in the model.
FIGURE 6
FIGURE 6
Residues that are significant in determining the substrate specificity of acyl-ACP TE. The top twenty-two residues selected by the random forest classifier (Figure 5) are shown as stick models. Red colored residues have previously been experimentally verified to affect substrate specificity (Jing et al., 2018a; Jing et al., 2018b). Catalytic residues are shown in yellow (Mayer and Shanklin, 2005; Serrano-Vega et al., 2005; Feng et al., 2017; Jing et al., 2018a). The dotted ovals indicate the structural region where the substrate binding pocket is located.

References

    1. Adams B. L. (2016). The next generation of synthetic biology chassis: moving synthetic biology from the laboratory to the field. ACS Synth. Biol. 5, 1328–1330. 10.1021/acssynbio.6b00256 - DOI - PubMed
    1. Andrew R. M. (2020). A comparison of estimates of global carbon dioxide emissions from fossil carbon sources. Earth Syst. Sci. Data 12, 1437–1465. 10.5194/essd-12-1437-2020 - DOI
    1. Banerjee D., Jindra M. A., Linot A. J., Pfleger B. F., Maranas C. D. (2022). EnZymClass: substrate specificity prediction tool of plant acyl-ACP thioesterases based on ensemble learning. Curr. Res. Biotechnol. 4, 1–9. 10.1016/j.crbiot.2021.12.002 - DOI
    1. Barnes S. J. (2019). Understanding plastics pollution: the role of economic development and technological research. Environ. Pollut. 249, 812–821. 10.1016/j.envpol.2019.03.108 - DOI - PubMed
    1. Basu S., Soderquist F., Wallner B. (2017). Proteus: a random forest classifier to predict disorder-to-order transitioning binding regions in intrinsically disordered proteins. J. Comput. Aided Mol. Des. 31, 453–466. 10.1007/s10822-017-0020-y - DOI - PMC - PubMed

LinkOut - more resources