A conditional neural fields model for protein threading
- PMID: 22689779
- PMCID: PMC3371845
- DOI: 10.1093/bioinformatics/bts213
A conditional neural fields model for protein threading
Abstract
Motivation: Alignment errors are still the main bottleneck for current template-based protein modeling (TM) methods, including protein threading and homology modeling, especially when the sequence identity between two proteins under consideration is low (<30%).
Results: We present a novel protein threading method, CNFpred, which achieves much more accurate sequence-template alignment by employing a probabilistic graphical model called a Conditional Neural Field (CNF), which aligns one protein sequence to its remote template using a non-linear scoring function. This scoring function accounts for correlation among a variety of protein sequence and structure features, makes use of information in the neighborhood of two residues to be aligned, and is thus much more sensitive than the widely used linear or profile-based scoring function. To train this CNF threading model, we employ a novel quality-sensitive method, instead of the standard maximum-likelihood method, to maximize directly the expected quality of the training set. Experimental results show that CNFpred generates significantly better alignments than the best profile-based and threading methods on several public (but small) benchmarks as well as our own large dataset. CNFpred outperforms others regardless of the lengths or classes of proteins, and works particularly well for proteins with sparse sequence profiles due to the effective utilization of structure information. Our methodology can also be adapted to protein sequence alignment.
Figures





Similar articles
-
Protein threading using context-specific alignment potential.Bioinformatics. 2013 Jul 1;29(13):i257-65. doi: 10.1093/bioinformatics/btt210. Bioinformatics. 2013. PMID: 23812991 Free PMC article.
-
ProALIGN: Directly Learning Alignments for Protein Structure Prediction via Exploiting Context-Specific Alignment Motifs.J Comput Biol. 2022 Feb;29(2):92-105. doi: 10.1089/cmb.2021.0430. Epub 2022 Jan 21. J Comput Biol. 2022. PMID: 35073170 Free PMC article.
-
BioShell-Threading: versatile Monte Carlo package for protein 3D threading.BMC Bioinformatics. 2014 Jan 20;15:22. doi: 10.1186/1471-2105-15-22. BMC Bioinformatics. 2014. PMID: 24444459 Free PMC article.
-
Sequence comparison and protein structure prediction.Curr Opin Struct Biol. 2006 Jun;16(3):374-84. doi: 10.1016/j.sbi.2006.05.006. Epub 2006 May 19. Curr Opin Struct Biol. 2006. PMID: 16713709 Review.
-
Recent Advances in Protein Homology Detection Propelled by Inter-Residue Interaction Map Threading.Front Mol Biosci. 2021 May 11;8:643752. doi: 10.3389/fmolb.2021.643752. eCollection 2021. Front Mol Biosci. 2021. PMID: 34046429 Free PMC article. Review.
Cited by
-
GADP-align: A genetic algorithm and dynamic programming-based method for structural alignment of proteins.Bioimpacts. 2021;11(4):271-279. doi: 10.34172/bi.2021.37. Epub 2020 Jul 8. Bioimpacts. 2021. PMID: 34631489 Free PMC article.
-
A tale of solving two computational challenges in protein science: neoantigen prediction and protein structure prediction.Brief Bioinform. 2022 Jan 17;23(1):bbab493. doi: 10.1093/bib/bbab493. Brief Bioinform. 2022. PMID: 34891158 Free PMC article. Review.
-
DisCovER: distance- and orientation-based covariational threading for weakly homologous proteins.Proteins. 2022 Feb;90(2):579-588. doi: 10.1002/prot.26254. Epub 2021 Oct 11. Proteins. 2022. PMID: 34599831 Free PMC article.
-
CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction.Bioinformatics. 2016 Jun 15;32(12):i332-i340. doi: 10.1093/bioinformatics/btw271. Bioinformatics. 2016. PMID: 27307635 Free PMC article.
-
Protein threading using residue co-variation and deep learning.Bioinformatics. 2018 Jul 1;34(13):i263-i273. doi: 10.1093/bioinformatics/bty278. Bioinformatics. 2018. PMID: 29949980 Free PMC article.
References
-
- Akutsu T., et al. Hardness results on local multiple alignment of biological sequences. Inform. Media Technol. 2007;2:514–522.
-
- Biegert A., Söding J. De novo identification of highly diverged protein repeats by probabilistic consistency. Bioinformatics. 2008;24:807–814. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Molecular Biology Databases