Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Sep 5:19:e00248.
doi: 10.1016/j.mec.2024.e00248. eCollection 2024 Dec.

PEZy-miner: An artificial intelligence driven approach for the discovery of plastic-degrading enzyme candidates

Affiliations

PEZy-miner: An artificial intelligence driven approach for the discovery of plastic-degrading enzyme candidates

Renjing Jiang et al. Metab Eng Commun. .

Abstract

Plastic waste has caused a global environmental crisis. Biocatalytic depolymerization mediated by enzymes has emerged as an efficient and sustainable alternative for plastic treatment and recycling. However, it is challenging and time-consuming to discover novel plastic-degrading enzymes using conventional cultivation-based or omics methods. There is a growing interest in developing effective computational methods to identify new enzymes with desirable plastic degradation functionalities by exploring the ever-increasing databases of protein sequences. In this study, we designed an innovative machine learning-based framework, named PEZy-Miner, to mine for enzymes with high potential in degrading plastics of interest. Two datasets integrating information from experimentally verified enzymes and homologs with unknown plastic-degrading activity were created respectively, covering eleven types of plastic substrates. Protein language models and binary classification models were developed to predict enzymatic degradation of plastics along with confidence and uncertainty estimation. PEZy-Miner exhibited high prediction accuracy and stability when validated on experimentally verified enzymes. Furthermore, by masking the experimentally verified enzymes and blending them into homolog dataset, PEZy-Miner effectively concentrated the experimentally verified entries by 14∼30 times while shortlisting promising plastic-degrading enzyme candidates. We applied PEZy-Miner to 0.1 million putative sequences, out of which 27 new sequences were identified with high confidence. This study provided a new computational tool for mining and recommending promising new plastic-degrading enzymes.

Keywords: Confidence and uncertainty estimation; Enzyme discovery; Machine learning; Plastic degradation; Protein language model.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

Fig. 1
Fig. 1
Overview of PEZy-Miner. The protein language model converted the input amino acid sequences into computer interpretable vectors. The classification module took as input the vectors, the biophysical features extracted from input sequences, and one-hot encoded plastic types for predicting the degradability of the input enzyme/plastic pairs. The confidence and uncertainty estimation module computed the confidence and uncertainty of the predictions. After five runs using different random seeds, the confidence and uncertainty estimation module integrated results to identify the top-ranked enzyme/plastic pairs common across the five runs.
Fig. 2
Fig. 2
Performance evaluation by accuracy for the ProtBERT_MLP (A), ProtBERT_proto (B), ESM_MLP (C), ESM_proto (D), RoBERTa_MLP (E), and RoBERTa_proto (F) models at different confidence and uncertainty thresholds using the experimental testing dataset. Accuracy values were displayed in every cell on the enzyme/plastic pairs before (top left cell in every subfigure) and after (other cells in every subfigure) filtering by the specified confidence and uncertainty thresholds.
Fig. 3
Fig. 3
Rankings of the enzyme/plastic pairs in the experimental testing dataset by ProtBERT_MLP (A), ProtBERT_proto (B), ESM_MLP (C), ESM_proto (D), RoBERTa_MLP (E), and RoBERTa_proto (F), with five random seeds for each model. Horizontal axis shows the 71 enzyme/plastic pairs in the experimental testing dataset, each assigned an index for identification purpose. Vertical axis is the five runs for each model. Color gradient represents the ranking values obtained from each run, with light to dark color indicating best to worst ranking of the enzyme/plastic pair in the run. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)
Fig. 4
Fig. 4
Machine learning predictions on the experimental testing dataset by ProtBERT_MLP (A) and ProtBERT_proto (B). Every marker represented an enzyme/plastic pair in the experimental testing dataset. Different plastic types were indicated by the color of markers. Incorrect predictions were denoted with an 'x'. The top 20% enzyme/plastic pairs were shown in the red window box. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)
Fig. 5
Fig. 5
Biological insights into enzyme candidates in the top list. (A) Distribution of enzymes involved in degradation of different plastic types in the top list, identified by the combination of ProtBERT_MLP and ProtBERT_proto models. (B) Distribution of pairwise sequence similarities between the top list and the experimental dataset. (C) Illustration of the associated plants and living environments of the source organisms of enzyme candidates.

Similar articles

References

    1. Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J. Basic local alignment search tool. J. Mol. Biol. 1990;215(3):403–410. - PubMed
    1. Bardají D.K.R., Furlan J.P.R., Stehling E.G. Isolation of a polyethylene degrading Paenibacillus sp. from a landfill in Brazil. Arch. Microbiol. 2019;201:699–704. doi: 10.1007/s00203-019-01637-9. - DOI - PubMed
    1. Blum A., Hopcroft J., Kannan R. Cambridge University Press; 2020. Foundations of Data Science.
    1. Buchholz P.C.F., Feuerriegel G., Zhang H., Perez-Garcia P., Nover L.-L., Chow J.…Pleiss J. Plastics degradation by hydrolytic enzymes: the plastics-active enzymes database—PAZy. Proteins: Struct., Funct., Bioinf. 2022;90(7):1443–1456. doi: 10.1002/prot.26325. - DOI - PubMed
    1. Chang X., Xue Y., Li J., Zou L., Tang M. Potential health impact of environmental micro- and nanoplastics pollution. J. Appl. Toxicol. 2020;40(1):4–15. doi: 10.1002/jat.3915. - DOI - PubMed

LinkOut - more resources