Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Oct;290(5):1919-31.
doi: 10.1007/s00438-015-1044-4. Epub 2015 Apr 21.

Protein remote homology detection by combining Chou's distance-pair pseudo amino acid composition and principal component analysis

Affiliations

Protein remote homology detection by combining Chou's distance-pair pseudo amino acid composition and principal component analysis

Bin Liu et al. Mol Genet Genomics. 2015 Oct.

Abstract

Protein remote homology detection is one of the important tasks in computational proteomics, which is important for basic research and practical application. Currently, the SVM-based discriminative methods have shown superior performance. However, the existing feature vectors still cannot suitably represent the protein sequences, and often lack an interpretable model for analysis of characteristic features. Previous studies showed that sequence-order effects and physicochemical properties are important for representing protein sequences. However, how to use these kinds of information for constructing predictors is still a challenging problem. In this study, in order to incorporate the sequence-order information and physicochemical properties into the prediction, a method called disPseAAC is proposed, in which the feature vector is constructed by combining the occurrences of amino acid pairs within the Chou's pseudo amino acid composition (PseAAC) approach. The predictive performance and computational cost are further improved by employing the principal component analysis strategy. Various experiments are conducted on a benchmark dataset. Experimental results show that disPseAAC achieves an ROC score of 0.922, outperforming some existing state-of-the-art methods. Furthermore, the learnt model can easily be analyzed in terms of discriminative features, and the computational cost of the proposed method is much lower than that of other profile-based methods.

Keywords: Principal component analysis; Protein remote homology; Pseudo amino acid composition; Support vector machine.

PubMed Disclaimer

Similar articles

Cited by

References

    1. Int J Mol Sci. 2014 Jun 10;15(6):10410-23 - PubMed
    1. Protein Pept Lett. 2015;22(2):137-48 - PubMed
    1. J Theor Biol. 2008 Jul 21;253(2):310-5 - PubMed
    1. Int J Mol Sci. 2014 Jun 25;15(7):11204-19 - PubMed
    1. J Theor Biol. 2009 Jul 21;259(2):366-72 - PubMed

Publication types

LinkOut - more resources