. 2012 Jun 15;28(12):i59-66.

doi: 10.1093/bioinformatics/bts213.

A conditional neural fields model for protein threading

Jianzhu Ma¹, Jian Peng, Sheng Wang, Jinbo Xu

Affiliations

PMID: 22689779
PMCID: PMC3371845
DOI: 10.1093/bioinformatics/bts213

A conditional neural fields model for protein threading

Jianzhu Ma et al. Bioinformatics. 2012.

. 2012 Jun 15;28(12):i59-66.

doi: 10.1093/bioinformatics/bts213.

Authors

Jianzhu Ma¹, Jian Peng, Sheng Wang, Jinbo Xu

Affiliation

¹ Toyota Technological Institute at Chicago, IL 60637, USA.

PMID: 22689779
PMCID: PMC3371845
DOI: 10.1093/bioinformatics/bts213

Abstract

Motivation: Alignment errors are still the main bottleneck for current template-based protein modeling (TM) methods, including protein threading and homology modeling, especially when the sequence identity between two proteins under consideration is low (<30%).

Results: We present a novel protein threading method, CNFpred, which achieves much more accurate sequence-template alignment by employing a probabilistic graphical model called a Conditional Neural Field (CNF), which aligns one protein sequence to its remote template using a non-linear scoring function. This scoring function accounts for correlation among a variety of protein sequence and structure features, makes use of information in the neighborhood of two residues to be aligned, and is thus much more sensitive than the widely used linear or profile-based scoring function. To train this CNF threading model, we employ a novel quality-sensitive method, instead of the standard maximum-likelihood method, to maximize directly the expected quality of the training set. Experimental results show that CNFpred generates significantly better alignments than the best profile-based and threading methods on several public (but small) benchmarks as well as our own large dataset. CNFpred outperforms others regardless of the lengths or classes of proteins, and works particularly well for proteins with sparse sequence profiles due to the effective utilization of structure information. Our methodology can also be adapted to protein sequence alignment.

PubMed Disclaimer

Figures

**Fig. 1.**
An example of a sequence–template alignment and its alignment path. (A) One alignment and its state representation. (B) Each path corresponds to one alignment with probability estimated by our CNF model

**Fig. 2.**
An example of the edge feature function φ, which is a neural network with one hidden layer. The function takes both template and target protein features as input and yields one log-likelihood score for state transition M to I_s. Meanwhile, H1, H2 and H3 are hidden neurons conducting non-linear transformation of the input features

**Fig. 3.**
Reference-independent alignment accuracy with respect to the sparsity of a sequence profile (i.e. NEFF). (A) NEFF is divided into nine bins. (B) NEFF is divided into two bins at the threshold 6

**Fig. 4.**
(A) Reference-independent alignment accuracy with respect to (A) protein class and (B) protein length. A protein with <150 amino acids is treated as small; otherwise as large

**Fig. 5.**
(A) TM-scores of the CNFpred and HHpred models for the 1000 targets from PDB25. Each point represents two models, one generated by CNFpred, and one by HHpred. (B) Distribution of the TM-score difference of two 3D models for the same target. Each blue (red) column shows the number of targets for which CNFpred (HHpred) is better by a given margin

See this image and copyright information in PMC

References

1. Akutsu T., et al. Hardness results on local multiple alignment of biological sequences. Inform. Media Technol. 2007;2:514–522.
1. Altschul S.F., et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. - PMC - PubMed
1. Bairoch A., et al. The universal protein resource (UniProt) Nucleic Acids Res. 2005;33:D154–D159. - PMC - PubMed
1. Bateman A., et al. The Pfam protein families database. Nucleic Acids Res. 2004;32:D138–D141. - PMC - PubMed
1. Biegert A., Söding J. De novo identification of highly diverged protein repeats by probabilistic consistency. Bioinformatics. 2008;24:807–814. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A conditional neural fields model for protein threading

Affiliation

A conditional neural fields model for protein threading

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Molecular Biology Databases