A Method for Prediction of Thermophilic Protein Based on Reduced Amino Acids and Mixed Features
- PMID: 32432088
- PMCID: PMC7214540
- DOI: 10.3389/fbioe.2020.00285
A Method for Prediction of Thermophilic Protein Based on Reduced Amino Acids and Mixed Features
Abstract
The thermostability of proteins is a key factor considered during enzyme engineering, and finding a method that can identify thermophilic and non-thermophilic proteins will be helpful for enzyme design. In this study, we established a novel method combining mixed features and machine learning to achieve this recognition task. In this method, an amino acid reduction scheme was adopted to recode the amino acid sequence. Then, the physicochemical characteristics, auto-cross covariance (ACC), and reduced dipeptides were calculated and integrated to form a mixed feature set, which was processed using correlation analysis, feature selection, and principal component analysis (PCA) to remove redundant information. Finally, four machine learning methods and a dataset containing 500 random observations out of 915 thermophilic proteins and 500 random samples out of 793 non-thermophilic proteins were used to train and predict the data. The experimental results showed that 98.2% of thermophilic and non-thermophilic proteins were correctly identified using 10-fold cross-validation. Moreover, our analysis of the final reserved features and removed features yielded information about the crucial, unimportant and insensitive elements, it also provided essential information for enzyme design.
Keywords: machine learning methods; mixed features; non-thermophilic protein; reduced amino acids; thermophilic protein.
Copyright © 2020 Feng, Ma, Yang, Li, Zhang and Li.
Figures




Similar articles
-
Prediction of thermophilic proteins using feature selection technique.J Microbiol Methods. 2011 Jan;84(1):67-70. doi: 10.1016/j.mimet.2010.10.013. Epub 2010 Oct 31. J Microbiol Methods. 2011. PMID: 21044646
-
Prediction of thermophilic protein using 2-D general series correlation pseudo amino acid features.Methods. 2023 Oct;218:141-148. doi: 10.1016/j.ymeth.2023.08.012. Epub 2023 Aug 19. Methods. 2023. PMID: 37604248
-
Prediction of thermophilic protein with pseudo amino Acid composition: an approach from combined feature selection and reduction.Protein Pept Lett. 2011 Jul;18(7):684-9. doi: 10.2174/092986611795446085. Protein Pept Lett. 2011. PMID: 21413920
-
Differences in amino acids composition and coupling patterns between mesophilic and thermophilic proteins.Amino Acids. 2008 Jan;34(1):25-33. doi: 10.1007/s00726-007-0589-x. Epub 2007 Aug 21. Amino Acids. 2008. PMID: 17710363 Review.
-
Machine learning approach to gene essentiality prediction: a review.Brief Bioinform. 2021 Sep 2;22(5):bbab128. doi: 10.1093/bib/bbab128. Brief Bioinform. 2021. PMID: 33842944 Review.
Cited by
-
Superior protein thermophilicity prediction with protein language model embeddings.NAR Genom Bioinform. 2023 Oct 11;5(4):lqad087. doi: 10.1093/nargab/lqad087. eCollection 2023 Dec. NAR Genom Bioinform. 2023. PMID: 37829176 Free PMC article.
-
Immunoglobulin Classification Based on FC* and GC* Features.Front Genet. 2022 Jan 24;12:827161. doi: 10.3389/fgene.2021.827161. eCollection 2021. Front Genet. 2022. PMID: 35140745 Free PMC article.
-
HPClas: A data-driven approach for identifying halophilic proteins based on catBoost.mLife. 2024 Jul 20;3(4):515-526. doi: 10.1002/mlf2.12125. eCollection 2024 Dec. mLife. 2024. PMID: 39744092 Free PMC article.
-
A novel sequence-based predictor for identifying and characterizing thermophilic proteins using estimated propensity scores of dipeptides.Sci Rep. 2021 Dec 10;11(1):23782. doi: 10.1038/s41598-021-03293-w. Sci Rep. 2021. PMID: 34893688 Free PMC article.
-
NMR Structure and Biophysical Characterization of Thermophilic Single-Stranded DNA Binding Protein from Sulfolobus Solfataricus.Int J Mol Sci. 2022 Mar 13;23(6):3099. doi: 10.3390/ijms23063099. Int J Mol Sci. 2022. PMID: 35328522 Free PMC article.
References
-
- Bhola A., Singh S. (2018). Gene selection using high dimensional gene expression data: an appraisal. Curr. Bioinf. 13 225–233.
-
- Bleicher L., Prates E. T., Gomes T. C. F., Silveira R. L., Nascimento A. S., Rojas A. L., et al. (2011). Molecular basis of the thermostability and thermophilicity of laminarinases: x-ray structure of the hyperthermostable laminarinase from rhodothermus marinus and molecular dynamics simulations. J. Phys. Chem. B 115 7940–7949. 10.1021/jp200330z - DOI - PubMed
-
- Chen C., Zhang Q. M., Ma Q., Yu B. (2019). LightGBM-PPI: Predicting protein-protein interactions through LightGBM with multi-information fusion. Chemometr. Intell. Labor. Syst. 191 54–64.
LinkOut - more resources
Full Text Sources