Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Aug;39(3):713-26.
doi: 10.1007/s00726-010-0506-6. Epub 2010 Feb 18.

DomSVR: domain boundary prediction with support vector regression from sequence information alone

Affiliations

DomSVR: domain boundary prediction with support vector regression from sequence information alone

Peng Chen et al. Amino Acids. 2010 Aug.

Abstract

Protein domains are structural and fundamental functional units of proteins. The information of protein domain boundaries is helpful in understanding the evolution, structures and functions of proteins, and also plays an important role in protein classification. In this paper, we propose a support vector regression-based method to address the problem of protein domain boundary identification based on novel input profiles extracted from AAindex database. As a result, our method achieves an average sensitivity of approximately 36.5% and an average specificity of approximately 81% for multi-domain protein chains, which is overall better than the performance of published approaches to identify domain boundary. As our method used sequence information alone, our method is simpler and faster.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Distribution of sequence positions of residues at the center of domain boundaries. Blue dot denotes two-domain chain while red dot stands for protein chain containing more than two domains
Fig. 2
Fig. 2
Chain length distributions as observed in the CATH representative set used in this study. Intervals were calculated with a width of 100 residues. The domain frequencies were used to calculate probabilities of predicted domain sizes
Fig. 3
Fig. 3
Comparison of raw and smoothing outputs from SVR model for protein chain 1qu6A. The protein chain has 179 residues and contains two domains lined by a domain boundary. The center of the domain boundary is at residue 94. The two types of outputs are normalized to the range [0, 1]. The two square curves denote the two kinds of residue labels. One is true labels describing residues’ states (boundaries/not boundaries); the other is predicted labels
Fig. 4
Fig. 4
ROC analysis for mainly alpha proteins with respect to threshold
Fig. 5
Fig. 5
ROC analysis for mainly beta proteins with respect to threshold
Fig. 6
Fig. 6
ROC analysis for alpha–beta proteins with respect to threshold
Fig. 7
Fig. 7
ROC analysis for fewer secondary structures proteins with respect to threshold
Fig. 8
Fig. 8
ROC analysis for other proteins with respect to threshold
Fig. 9
Fig. 9
Performance comparison based on CASP7 dataset. No left-diagonal striped bars are shown in the right graph for template-based, ab initio, and DomSVR predictors, since the prediction accuracies for three-domain chains are zeros
Fig. 10
Fig. 10
Comparison of natural versus predicted domain boundaries for protein chain 1qu6_A. The domain boundary (true or predicted) is shown as space filling grey spheres. a True domain boundary for protein chain1qu6A, b Predicted domain boundary for protein chain1qu6A

Similar articles

Cited by

References

    1. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. - PMC - PubMed
    1. Baldi P, Brunak S, Chauvin Y, Andersen CA, Nielsen H. Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics. 2000;16:412–424. - PubMed
    1. Bryson K, McGuffin LJ, Marsden RL, Ward JJ, Sodhi JS, Jones DT. Protein structure prediction servers at University College London. Nucleic Acids Res. 2005;33:w36–w38. - PMC - PubMed
    1. Chen P, Wang B, Wong HS, Huang DS. Prediction of protein B-factors using multi-class bounded SVM. Protein Pept Lett. 2007;14(2):185–190. - PubMed
    1. Cheng J, Sweredoski MJ, Baldi P. DOMpro: protein domain prediction using profiles, secondary structure, relative solvent accessibility, and recursive neural networks. Data Min Knowl Discov. 2006;13:1–10.

Publication types