Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Sep 3;15(18):16070-16083.
doi: 10.1021/acscatal.5c03460. eCollection 2025 Sep 19.

Machine Learning-Guided Identification of PET Hydrolases from Natural Diversity

Affiliations

Machine Learning-Guided Identification of PET Hydrolases from Natural Diversity

Brenna Norton-Baker et al. ACS Catal. .

Abstract

The enzymatic depolymerization of poly-(ethylene terephthalate) (PET) is emerging as a leading chemical recycling technology for waste polyester. As part of this endeavor, new candidate enzymes identified from natural diversity can serve as useful starting points for enzyme evolution and engineering. In this study, we improved upon HMM searches by applying an iterative machine learning strategy to identify 400 putative PET-degrading enzymes (PET hydrolases) from naturally occurring homologs. Using high-throughput (HTP) experimental techniques, we successfully expressed and purified >200 enzyme candidates and assayed them for PET hydrolysis activity as a function of pH, temperature, and substrate crystallinity. From this library, we discovered 91 previously unknown PET hydrolases, 35 of which retain activity at pH 4.5 on crystalline material, which are conditions relevant to developing more efficient commercial processes. Notably, four enzymes showed equal to or higher activity than LCC-ICCG, a benchmark PET hydrolase, at this challenging condition in our screening assay, and 11 of which have pH optima <7. Using these data, we identified regions of PETases statistically correlated to activity at lower pH. We additionally investigated the effect of condition-specific activity data on trained machine learning predictors and found a precision (putative hit rate) improvement of up to 30% compared to a Hidden Markov Model alone. Our findings show that by pointing enzyme discovery toward conditions of interest with multiple rounds of experimental and machine learning, we can discover large sets of active enzymes and explore factors associated with activity at those conditions.

Keywords: PET hydrolase; biocatalysis; high-throughput assay; interfacial biocatalysis; machine learning.

PubMed Disclaimer

Figures

1
1
(A) Overview of the process for each round of sequence mining and filtering. Candidates were mined from reference sequences using an HMM of active PET hydrolases. In Round 1, candidates were filtered via a PET hydrolysis activity predictor trained on scraped literature data. In Round 2, the PET activity predictor was improved, and thermal stability and acid tolerance predictors were also leveraged to filter candidates. In Round 3, high quality and uniform experimental data from previous rounds was used to train predictors that we used to filter additional candidates. (B) Transformation, expression, and purification of the putative PET hydrolases assisted by the liquid handling system, OT-2, to achieve tagless purified enzymes via Ni-affinity separation and targeted proteolytic cleavages (C) Histogram showing the purified yields from a single well of the enzymes studied. Those below 25 μg (beige bar) were below our detection threshold. (D) Histogram showing the thermostability of the enzymes measured by differential scanning fluorimetry (DSF). If multiple inflection points were observed, the highest melting temperature (T m) is plotted. Portions of this figure were created using Biorender.com.
2
2
PET hydrolase activity in μmol aromatic products produced per mg of enzyme after 48 h added at varying pH (color of bar), temperature (left vs right for each enzyme), and substrate crystallinities–amorphous film (aFilm, solid bars) and crystalline powder (cryPow, hatched bars). Only enzymes with activity above 25 μmol product/mg of enzyme in at least one condition are shown. (A) Enzymes with yields sufficient to test in 32 conditions: 4 pHs, 2 temperatures, 2 PET substrates, in duplicate. Error bars represent the range between biological duplicates. (B) Enzymes with yields sufficient to test in 8 conditions: 4 pHs, 2 temperatures, 1 PET substrate (cryPow), based on single measurements. (C) Enzymes with yields sufficient to test in 4 conditions: 4 pHs, 1 temperature (40 °C), 1 PET substrate (cryPow), based on single measurements. (D) Observed hit rate for active PET hydrolases (the ratio of active PET hydrolases to the total number assayed for activity) and median T m across the three rounds of mining, filtering, and testing. Gray bars represent the hit rate for enzymes showing activity under any condition tested; however, not all enzymes were tested under every condition. Orange bars represent the hit rate for active enzymes on crystalline powder at pH 5.5 and 40 °C, a targeted condition where all enzymes were tested.
3
3
Sequence diversity of the PET hydrolases studied. (A) Minimum evolution phylogenetic tree computed using a multiple sequence alignment of all PET hydrolases with >50% gap columns removed. Extra domains were not included in the tree distance calculation due to the high gap content. Only candidates with greater than 20 μmol product/mg of enzyme conversion activity at pH 5.5 crystalline powder at 40 °C are shown, along with DP043. Enzyme activity is represented by marker color. Select enzymes from other studies are shown. Squares are enzymes with a Family 13 carbohydrate binding module (CBM13) identified by dbCAN2, and the background highlight shows a branch of the tree with many enzymes exhibiting a CBM13. (B) 2D UMAP plot using BLOSUM62 scores from an alignment of PET hydrolases from this work (circles) overlaid on the space of known PET hydrolases (red kernel density). Actives with CBM13 are depicted with squares. High-performing or widely studied PET hydrolases are depicted as red dots. − ,,
4
4
Significant sequence and surface properties associated with low pH activity. All 3D structures show a trimer of PET in green and the three catalytic residues (His, Asp, Ser) in pink. AlphaFold3 was used to produce the structures. (A) The relationship between the number of factors held by an enzyme as a function of its experimental activity for crystalline powder, pH = 4.5, T = 40 °C, where enzymes with more observed factors tend to have higher activity. (B) Nine residues mapped to LCC-ICCG that were more conserved for low pH activity than they were for neutral activity. (C) Mean difference in surface Kyte-Doolittle hydrophobicity, again mapped to LCC-ICCG, between the low pH active group and the neutral activity group, only colored for positions where statistical significance was observed. Most of the surface had a statistically significant shift toward more hydrophobic for acid tolerant candidates, most strongly near the binding site. (D) Factors exhibited by ESM065, which had comparable activity to LCC-ICCG at pH 4.5 and a pH optima <6. Left: counts of factors (conservation, pK a, electrostatics, hydrophobicity, circular variance, and stickiness) observed by ESM065 that were statistically significant when acid tolerant and neutral candidates were compared. Of all factors found to be significant when comparing these groups, those exhibited by ESM065 (if the ESM065 value is closer to the acid tolerant mean) are marked. Larger and brighter regions of the protein backbone indicate more factors (max 6). Middle: Residues that tended to be conserved for acid tolerance that we observed in ESM095 (6 out of 9). Right: hydrophobicity for ESM095, only colored for positions that where significant differences were found. (E) Same as in (D), but for ESM091, another representative from the 11 candidates with pH optima <6.
5
5
Residues in or near the active site with statistically significantly different predicted pK a values between examples with low pH activity and those with activity at higher pH, mapped to the structure of TEP109. All residue positions are given relative to the MSA alignment. The difference in predicted pK a mean values between those groups where it was found to be statistically significant is given in the third row. These include the proton transferring histidine (H384, teal), catalytic aspartate (D352, pink), and a histidine (H300, orange) directly adjacent to the catalytic serine (S301, purple). Three other residues within 15 Å that are often or sometimes charged are highlighted (alignment positions: E224 (green), D383 (black), Q378 (yellow)). AlphaFold3 prediction was used to “dock” a trimer of PET (blue), and all distances (Å) shown use the TEP109 predicted structure. TEP109 is the top performing candidate at pH 4.5 and one of a very small number to exhibit E224 and an uncharged amino acid at alignment position 378.
6
6
Performance of models across conditions. (A) Scores of supervised models compared in 5-fold cross validation to the starting HMM-17, tuned HMM-17, and the supervised model trained on D1–513-Scraped used in Round 2 (“lit. supervised”). The numbers above each condition show the number active (top) and number tested (bottom). Conditions where >0.1 AUROC improvement by the supervised models was observed are highlighted in purple. (B) Prediction by the supervised model as a function of measured enzyme activity at conditions where high performance was achieved. Decision boundary was set at 0.5, with enzymes above classified as “active” and those below as “inactive”. Enzymes that were experimentally determined as active are orange and those inactive are green.

References

    1. Tournier V., Duquesne S., Guillamot F., Cramail H., Taton D., Marty A., André I.. Enzymes’ Power for Plastics Degradation. Chem. Rev. 2023;123(9):5612–5701. doi: 10.1021/acs.chemrev.2c00644. - DOI - PubMed
    1. Oda K., Wlodawer A.. Development of Enzyme-Based Approaches for Recycling PET on an Industrial Scale. Biochemistry. 2024;63(4):369–401. doi: 10.1021/acs.biochem.3c00554. - DOI - PubMed
    1. Müller R., Schrader H., Profe J., Dresler K., Deckwer W.. Enzymatic Degradation of Poly­(Ethylene Terephthalate):0 1Rapid Hydrolyse Using a Hydrolase from T. Fusca . Macromol. Rapid Commun. 2005;26(17):1400–1405. doi: 10.1002/marc.200500410. - DOI
    1. Chen S., Tong X., Woodard R. W., Du G., Wu J., Chen J.. Identification and Characterization of Bacterial Cutinase. J. Biol. Chem. 2008;283(38):25854–25862. doi: 10.1074/jbc.M800848200. - DOI - PMC - PubMed
    1. Sulaiman S., Yamato S., Kanaya E., Kim J.-J., Koga Y., Takano K., Kanaya S.. Isolation of a Novel Cutinase Homolog with Polyethylene Terephthalate-Degrading Activity from Leaf-Branch Compost by Using a Metagenomic Approach. Appl. Environ. Microbiol. 2012;78(5):1556–1562. doi: 10.1128/AEM.06725-11. - DOI - PMC - PubMed

LinkOut - more resources