. 2009 Mar;5(3):e1000307.

doi: 10.1371/journal.pcbi.1000307. Epub 2009 Mar 13.

Probabilistic interaction network of evidence algorithm and its application to complete labeling of peak lists from protein NMR spectroscopy

Arash Bahrami¹, Amir H Assadi, John L Markley, Hamid R Eghbalnia

Affiliations

Affiliation

¹ Biochemistry Department, National Magnetic Resonance Facility at Madison, University of Wisconsin Madison, Madison, Wisconsin, United States of America. arash@nmrfam.wisc.edu

PMID: 19282963
PMCID: PMC2645676
DOI: 10.1371/journal.pcbi.1000307

Probabilistic interaction network of evidence algorithm and its application to complete labeling of peak lists from protein NMR spectroscopy

Arash Bahrami et al. PLoS Comput Biol. 2009 Mar.

. 2009 Mar;5(3):e1000307.

doi: 10.1371/journal.pcbi.1000307. Epub 2009 Mar 13.

Authors

Arash Bahrami¹, Amir H Assadi, John L Markley, Hamid R Eghbalnia

Affiliation

¹ Biochemistry Department, National Magnetic Resonance Facility at Madison, University of Wisconsin Madison, Madison, Wisconsin, United States of America. arash@nmrfam.wisc.edu

PMID: 19282963
PMCID: PMC2645676
DOI: 10.1371/journal.pcbi.1000307

Abstract

The process of assigning a finite set of tags or labels to a collection of observations, subject to side conditions, is notable for its computational complexity. This labeling paradigm is of theoretical and practical relevance to a wide range of biological applications, including the analysis of data from DNA microarrays, metabolomics experiments, and biomolecular nuclear magnetic resonance (NMR) spectroscopy. We present a novel algorithm, called Probabilistic Interaction Network of Evidence (PINE), that achieves robust, unsupervised probabilistic labeling of data. The computational core of PINE uses estimates of evidence derived from empirical distributions of previously observed data, along with consistency measures, to drive a fictitious system M with Hamiltonian H to a quasi-stationary state that produces probabilistic label assignments for relevant subsets of the data. We demonstrate the successful application of PINE to a key task in protein NMR spectroscopy: that of converting peak lists extracted from various NMR experiments into assignments associated with probabilities for their correctness. This application, called PINE-NMR, is available from a freely accessible computer server (http://pine.nmrfam.wisc.edu). The PINE-NMR server accepts as input the sequence of the protein plus user-specified combinations of data corresponding to an extensive list of NMR experiments; it provides as output a probabilistic assignment of NMR signals (chemical shifts) to sequence-specific backbone and aliphatic side chain atoms plus a probabilistic determination of the protein secondary structure. PINE-NMR can accommodate prior information about assignments or stable isotope labeling schemes. As part of the analysis, PINE-NMR identifies, verifies, and rectifies problems related to chemical shift referencing or erroneous input data. PINE-NMR achieves robust and consistent results that have been shown to be effective in subsequent steps of NMR structure determination.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Figure 1. Conventional stages in protein structure determination by NMR.**
After the data have been collected, the challenging “front-end” process leads to sequence-specific amino acid labeling. The “back-end” process then leads to the three-dimensional structure.

**Figure 2. Conventional process of resonance assignments for a protein labeled with stable isotopes (¹³C and ¹⁵N).**
Peaks observed in multidimensional spectra are matched to search for common frequencies. Some common frequencies identify atoms within a residue; others identify atoms in neighboring residues. The common visual aid in this process is a series of paired strip plots from complementary NMR experiments. Strips from CBCA(CO)NH (a and c) and HNCACB (b and d) experiments can be used here to assign the tripeptide Thr-Tyr-His. Starting with C^α (CA) and C^β (CB) frequencies assumed to belong to Thr⁶⁶ (strip a), a horizontal trace (line), arising from the common frequency of NH nuclei, is used to locate C^α and C^β of Tyr⁶⁷ in (strip b). To continue the process, the same peaks are located in (strip c), and the peaks are traced to strip d. In strip d, given the accepted tolerances across spectra (shown by boxes around the selected peaks), several alternative assignments are plausible for His⁶⁸. These additional peaks may be artifacts (false peaks), or peaks from other nuclei with similar frequency. Depending on the starting point of the assignment process, the choice of experiments, the amount of conflicting information, or other factors, an exponentially expanding number of alternative assignments can arise, rendering a computational solution intractable. This difficulty has proved to be a major drawback for NMR structure determination, particularly for larger proteins.

**Figure 3. Illustration of the system of neighborhoods built around each data value in PINE.**
Each input data point (S) is linked to a set of labels (L) with associated weights. Similarity measures and constraints are utilized to construct each neighborhood system or topology (as denoted by the arrows).

**Figure 4. Global network of relationships in PINE-NMR.**
A set of probabilistic influence sub-networks are combined into a larger influence network. The iterative probabilistic inference on the complex network ensures globally consistent labeling.

**Figure 5. Spin system generation network in PINE-NMR.**
The peaks in the most sensitive experiments in the data are used initially as reference peaks. Aligning the peaks along the common dimensions and registering them with respect to reference peaks enables us to define a common putative object called the spin system. Spin systems are then assembled to derive triplet spin systems.

**Figure 6. Graphical network for backbone chemical shift assignments.**
Overlapping tripeptides (triplet residue) are evaluated. The weights on the edges are derived from amino acid typing, secondary structures, connectivity experiments, and possible outlier assignments. According to the statistical physics model described in the text, application of the belief propagation algorithm yields the marginal probabilities for backbone assignments.

See this image and copyright information in PMC

References

1. Markwick PR, Malliavin T, Nilges M. Structural biology by NMR: structure, dynamics, and interactions. PLoS Comput Biol. 2008;4:e1000168. doi:10.1371/journal.pcbi.1000168. - PMC - PubMed
1. Billeter M, Basus VJ, Kuntz ID. A program for semi-automatic sequential resonance assignments in protein 1H nuclear magnetic resonance spectra. J Magn Reson. 1988;76:400–415.
1. Xu Y, Zheng Y, Fan JS, Yang D. A new strategy for structure determination of large proteins in solution without deuteration. Nat Methods. 2006;3:931–937. - PubMed
1. Grishaev A, Llinas M. CLOUDS, a protocol for deriving a molecular proton density via NMR. Proc Natl Acad Sci U S A. 2002;99:6707–6712. - PMC - PubMed
1. Shen Y, Lange O, Delaglio F, Rossi P, Aramini JM, et al. Consistent blind protein structure generation from NMR chemical shift data. Proc Natl Acad Sci U S A. 2008;105:4685–4690. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

P41 RR02301/RR/NCRR NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Probabilistic interaction network of evidence algorithm and its application to complete labeling of peak lists from protein NMR spectroscopy

Affiliation

Probabilistic interaction network of evidence algorithm and its application to complete labeling of peak lists from protein NMR spectroscopy

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials