Identification of putative domain linkers by a neural network - application to a large sequence database
- PMID: 16800897
- PMCID: PMC1538634
- DOI: 10.1186/1471-2105-7-323
Identification of putative domain linkers by a neural network - application to a large sequence database
Abstract
Background: The reliable dissection of large proteins into structural domains represents an important issue for structural genomics/proteomics projects. To provide a practical approach to this issue, we tested the ability of neural network to identify domain linkers from the SWISSPROT database (101602 sequences).
Results: Our search detected 3009 putative domain linkers adjacent to or overlapping with domains, as defined by sequence similarity to either Protein Data Bank (PDB) or Conserved Domain Database (CDD) sequences. Among these putative linkers, 75% were "correctly" located within 20 residues of a domain terminus, and the remaining 25% were found in the middle of a domain, and probably represented failed predictions. Moreover, our neural network predicted 5124 putative domain linkers in structurally un-annotated regions without sequence similarity to PDB or CDD sequences, which suggest to the possible existence of novel structural domains. As a comparison, we performed the same analysis by identifying low-complexity regions (LCR), which are known to encode unstructured polypeptide segments, and observed that the fraction of LCRs that correlate with domain termini is similar to that of domain linkers. However, domain linkers and LCRs appeared to identify different types of domain boundary regions, as only 32% of the putative domain linkers overlapped with LCRs.
Conclusion: Overall, our study indicates that the two methods detect independent and complementary regions, and that the combination of these methods can substantially improve the sensitivity of the domain boundary prediction. This finding should enable the identification of novel structural domains, yielding new targets for large scale protein analyses.
Figures





Similar articles
-
Domain-based small molecule binding site annotation.BMC Bioinformatics. 2006 Mar 17;7:152. doi: 10.1186/1471-2105-7-152. BMC Bioinformatics. 2006. PMID: 16545112 Free PMC article.
-
Inferring boundary information of discontinuous-domain proteins.IEEE Trans Nanobioscience. 2008 Sep;7(3):200-5. doi: 10.1109/TNB.2008.2002283. IEEE Trans Nanobioscience. 2008. PMID: 18779100
-
Blast sampling for structural and functional analyses.BMC Bioinformatics. 2007 Feb 23;8:62. doi: 10.1186/1471-2105-8-62. BMC Bioinformatics. 2007. PMID: 17319945 Free PMC article.
-
Large-scale database searching using tandem mass spectra: looking up the answer in the back of the book.Nat Methods. 2004 Dec;1(3):195-202. doi: 10.1038/nmeth725. Nat Methods. 2004. PMID: 15789030 Review.
-
Automatic annotation of protein function.Curr Opin Struct Biol. 2005 Jun;15(3):267-74. doi: 10.1016/j.sbi.2005.05.010. Curr Opin Struct Biol. 2005. PMID: 15922590 Review.
Cited by
-
IS-Dom: a dataset of independent structural domains automatically delineated from protein structures.J Comput Aided Mol Des. 2013 May;27(5):419-26. doi: 10.1007/s10822-013-9654-6. Epub 2013 May 29. J Comput Aided Mol Des. 2013. PMID: 23715893
-
H-DROP: an SVM based helical domain linker predictor trained with features optimized by combining random forest and stepwise selection.J Comput Aided Mol Des. 2014 Aug;28(8):831-9. doi: 10.1007/s10822-014-9763-x. Epub 2014 Jun 26. J Comput Aided Mol Des. 2014. PMID: 24965847
-
Fast H-DROP: A thirty times accelerated version of H-DROP for interactive SVM-based prediction of helical domain linkers.J Comput Aided Mol Des. 2017 Feb;31(2):237-244. doi: 10.1007/s10822-016-9999-8. Epub 2016 Dec 27. J Comput Aided Mol Des. 2017. PMID: 28028736
-
Identifying foldable regions in protein sequence from the hydrophobic signal.Nucleic Acids Res. 2008 Feb;36(2):578-88. doi: 10.1093/nar/gkm1070. Epub 2007 Dec 1. Nucleic Acids Res. 2008. PMID: 18056079 Free PMC article.
-
Folding by numbers: primary sequence statistics and their use in studying protein folding.Int J Mol Sci. 2009 Apr 8;10(4):1567-1589. doi: 10.3390/ijms10041567. Int J Mol Sci. 2009. PMID: 19468326 Free PMC article. Review.
References
-
- Mallick P, Goodwill KE, Fitz-Gibbon S, Miller JH, Eisenberg D. Selecting protein targets for structural genomics of Pyrobaculum aerophilum: validating automated fold assignment methods by using binary hypothesis testing. Proc Natl Acad Sci U S A. 2000;97:2450–2455. doi: 10.1073/pnas.050589297. - DOI - PMC - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Research Materials