Protein remote homology detection by combining Chou's distance-pair pseudo amino acid composition and principal component analysis
- PMID: 25896721
- DOI: 10.1007/s00438-015-1044-4
Protein remote homology detection by combining Chou's distance-pair pseudo amino acid composition and principal component analysis
Abstract
Protein remote homology detection is one of the important tasks in computational proteomics, which is important for basic research and practical application. Currently, the SVM-based discriminative methods have shown superior performance. However, the existing feature vectors still cannot suitably represent the protein sequences, and often lack an interpretable model for analysis of characteristic features. Previous studies showed that sequence-order effects and physicochemical properties are important for representing protein sequences. However, how to use these kinds of information for constructing predictors is still a challenging problem. In this study, in order to incorporate the sequence-order information and physicochemical properties into the prediction, a method called disPseAAC is proposed, in which the feature vector is constructed by combining the occurrences of amino acid pairs within the Chou's pseudo amino acid composition (PseAAC) approach. The predictive performance and computational cost are further improved by employing the principal component analysis strategy. Various experiments are conducted on a benchmark dataset. Experimental results show that disPseAAC achieves an ROC score of 0.922, outperforming some existing state-of-the-art methods. Furthermore, the learnt model can easily be analyzed in terms of discriminative features, and the computational cost of the proposed method is much lower than that of other profile-based methods.
Keywords: Principal component analysis; Protein remote homology; Pseudo amino acid composition; Support vector machine.
Similar articles
-
Protein Remote Homology Detection by Combining Chou's Pseudo Amino Acid Composition and Profile-Based Protein Representation.Mol Inform. 2013 Oct;32(9-10):775-82. doi: 10.1002/minf.201300084. Epub 2013 Jul 24. Mol Inform. 2013. PMID: 27480230
-
Using amino acid physicochemical distance transformation for fast protein remote homology detection.PLoS One. 2012;7(9):e46633. doi: 10.1371/journal.pone.0046633. Epub 2012 Sep 28. PLoS One. 2012. PMID: 23029559 Free PMC article.
-
PseDNA-Pro: DNA-Binding Protein Identification by Combining Chou's PseAAC and Physicochemical Distance Transformation.Mol Inform. 2015 Jan;34(1):8-17. doi: 10.1002/minf.201400025. Epub 2014 Sep 26. Mol Inform. 2015. PMID: 27490858
-
Prediction of protein subcellular localization with oversampling approach and Chou's general PseAAC.J Theor Biol. 2018 Jan 21;437:239-250. doi: 10.1016/j.jtbi.2017.10.030. Epub 2017 Oct 31. J Theor Biol. 2018. PMID: 29100918
-
Research progress of reduced amino acid alphabets in protein analysis and prediction.Comput Struct Biotechnol J. 2022 Jul 4;20:3503-3510. doi: 10.1016/j.csbj.2022.07.001. eCollection 2022. Comput Struct Biotechnol J. 2022. PMID: 35860409 Free PMC article. Review.
Cited by
-
Protein remote homology detection based on bidirectional long short-term memory.BMC Bioinformatics. 2017 Oct 10;18(1):443. doi: 10.1186/s12859-017-1842-2. BMC Bioinformatics. 2017. PMID: 29017445 Free PMC article.
-
iRSpot-DACC: a computational predictor for recombination hot/cold spots identification based on dinucleotide-based auto-cross covariance.Sci Rep. 2016 Sep 19;6:33483. doi: 10.1038/srep33483. Sci Rep. 2016. PMID: 27641752 Free PMC article.
-
Prediction of phosphothreonine sites in human proteins by fusing different features.Sci Rep. 2016 Oct 4;6:34817. doi: 10.1038/srep34817. Sci Rep. 2016. PMID: 27698459 Free PMC article.
-
Identifying Plant Pentatricopeptide Repeat Coding Gene/Protein Using Mixed Feature Extraction Methods.Front Plant Sci. 2019 Jan 10;9:1961. doi: 10.3389/fpls.2018.01961. eCollection 2018. Front Plant Sci. 2019. PMID: 30687359 Free PMC article.
-
iDPF-PseRAAAC: A Web-Server for Identifying the Defensin Peptide Family and Subfamily Using Pseudo Reduced Amino Acid Alphabet Composition.PLoS One. 2015 Dec 29;10(12):e0145541. doi: 10.1371/journal.pone.0145541. eCollection 2015. PLoS One. 2015. PMID: 26713618 Free PMC article.
References
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources