. 2014 Apr 27:15:118.

doi: 10.1186/1471-2105-15-118.

H2rs: deducing evolutionary and functionally important residue positions by means of an entropy and similarity based analysis of multiple sequence alignments

Jan-Oliver Janda, Ajmal Popal, Jochen Bauer, Markus Busch, Michael Klocke, Wolfgang Spitzer, Jörg Keller, Rainer Merkl¹

Affiliations

PMID: 24766829
PMCID: PMC4021312
DOI: 10.1186/1471-2105-15-118

H2rs: deducing evolutionary and functionally important residue positions by means of an entropy and similarity based analysis of multiple sequence alignments

Jan-Oliver Janda et al. BMC Bioinformatics. 2014.

. 2014 Apr 27:15:118.

doi: 10.1186/1471-2105-15-118.

Authors

Jan-Oliver Janda, Ajmal Popal, Jochen Bauer, Markus Busch, Michael Klocke, Wolfgang Spitzer, Jörg Keller, Rainer Merkl¹

Affiliation

¹ Institute of Biophysics and Physical Biochemistry, University of Regensburg, D-93040 Regensburg, Germany. rainer.merkl@ur.de.

PMID: 24766829
PMCID: PMC4021312
DOI: 10.1186/1471-2105-15-118

Abstract

Background: The identification of functionally important residue positions is an important task of computational biology. Methods of correlation analysis allow for the identification of pairs of residue positions, whose occupancy is mutually dependent due to constraints imposed by protein structure or function. A common measure assessing these dependencies is the mutual information, which is based on Shannon's information theory that utilizes probabilities only. Consequently, such approaches do not consider the similarity of residue pairs, which may degrade the algorithm's performance. One typical algorithm is H2r, which characterizes each individual residue position k by the conn(k)-value, which is the number of significantly correlated pairs it belongs to.

Results: To improve specificity of H2r, we developed a revised algorithm, named H2rs, which is based on the von Neumann entropy (vNE). To compute the corresponding mutual information, a matrix A is required, which assesses the similarity of residue pairs. We determined A by deducing substitution frequencies from contacting residue pairs observed in the homologs of 35 809 proteins, whose structure is known. In analogy to H2r, the enhanced algorithm computes a normalized conn(k)-value. Within the framework of H2rs, only statistically significant vNE values were considered. To decide on significance, the algorithm calculates a p-value by performing a randomization test for each individual pair of residue positions. The analysis of a large in silico testbed demonstrated that specificity and precision were higher for H2rs than for H2r and two other methods of correlation analysis. The gain in prediction quality is further confirmed by a detailed assessment of five well-studied enzymes. The outcome of H2rs and of a method that predicts contacting residue positions (PSICOV) overlapped only marginally. H2rs can be downloaded from http://www-bioinf.uni-regensburg.de.

Conclusions: Considering substitution frequencies for residue pairs by means of the von Neumann entropy and a p-value improved the success rate in identifying important residue positions. The integration of proven statistical concepts and normalization allows for an easier comparison of results obtained with different proteins. Comparing the outcome of the local method H2rs and of the global method PSICOV indicates that such methods supplement each other and have different scopes of application.

PubMed Disclaimer

Figures

**Figure 1**
**Computation of a pairwise similarity matrix A. (A)** For each residue (k, blue) of our dataset, all neighbors with a distance of at most 5 Å measured between the centers of heavy atoms were determined. Here, it is one residue l marked red. **(B)** Residue positions *k, l* were linked with the corresponding columns of the MSA and transition frequencies were deduced from a comparison of the residue pairs. **(C)** In this illustrative example, we observe one transition from AA to AC, two transitions from AA to CA and one transition from AA to CC. Transition frequencies were used to construct the 400 × 400 matrix A of substitution frequencies for residue pairs.

**Figure 2**
**Distribution of** U_vNE() values for one pair of residue positions. The histogram (blue) shows the distribution of the U_vNE(k*, l*) values of the first two residue positions of ssTrpC resulting from shuffling the content of columns k and l of the MSA. A normality test on this distribution failed (P = 0.991), which indicates that the distribution is not Gaussian. The corresponding cumulative distribution is shown in black. The cumulative *Gumbel* distribution with parameters μ and β deduced from 25 randomization tests is shown in green. The red line depicts the actual U_vNE value of this pair of residue positions. The orange line shows the U_vNE value this pair would need to surpass a p-value of 0.01.

**Figure 3**
**Residues of the stTrpA/stTrpB complex possessing highest** ***conz***(k)-**values.** For stTrpA (light blue) and stTrpB (gold), residues with *conz*(k)-values ≥ 2.0 and p-values ≤ 10^-11 are plotted in red as sticks. H2rs predicted for stTrpA 2, and for stTrpB 13 important residue positions. Ligands indole-3-glycerol phosphate and pyridoxal phosphate are plotted as green sticks. The sodium ion is shown as a green ball.

**Figure 4**
**Residues of ssTrpC with highest** ***conz***(k)-**values.** For ssTrpC, H2rs identified 7 residues with *conz*(k)-values ≥ 2.0 and p-values ≤ 10^-11, which are shown as red sticks. The ligand indole-3-glycerol phosphate is shown as green sticks.

**Figure 5**
**ecDHFR residues with highest** ***conz***(k)-**values.** For ecDHFR, H2rs predicted 6 residues with *conz*(k)-values ≥ 2.0 and p-values ≤ 10^-11, which are shown as red sticks. The ligands folic acid and NADP are shown as green sticks.

**Figure 6**
**smHK residues with highest** ***conz***(k)-**values.** For smHK, H2rs predicted 10 residues with *conz*(k)-values ≥ 2.0 and p-values ≤ 10^-11, which are shown as red sticks. The ligand GLC is shown as green sticks and the SO₄ ion in the catalytic cleft as green balls.

See this image and copyright information in PMC

Cited by

Molecular dynamics and structure function analysis show that substrate binding and specificity are major forces in the functional diversification of Eqolisins.
Stocchi N, Revuelta MV, Castronuovo PAL, Vera DMA, Ten Have A. Stocchi N, et al. BMC Bioinformatics. 2018 Sep 24;19(1):338. doi: 10.1186/s12859-018-2348-2. BMC Bioinformatics. 2018. PMID: 30249179 Free PMC article.
Inferring joint sequence-structural determinants of protein functional specificity.
Neuwald AF, Aravind L, Altschul SF. Neuwald AF, et al. Elife. 2018 Jan 16;7:e29880. doi: 10.7554/eLife.29880. Elife. 2018. PMID: 29336305 Free PMC article.
A Single Mutation Increases the Thermostability and Activity of Aspergillus terreus Amine Transaminase.
Zhu WL, Hu S, Lv CJ, Zhao WR, Wang HP, Mei JQ, Mei LH, Huang J. Zhu WL, et al. Molecules. 2019 Mar 27;24(7):1194. doi: 10.3390/molecules24071194. Molecules. 2019. PMID: 30934681 Free PMC article.
Deep Analysis of Residue Constraints (DARC): identifying determinants of protein functional specificity.
Tondnevis F, Dudenhausen EE, Miller AM, McKenna R, Altschul SF, Bloom LB, Neuwald AF. Tondnevis F, et al. Sci Rep. 2020 Feb 3;10(1):1691. doi: 10.1038/s41598-019-55118-6. Sci Rep. 2020. PMID: 32015389 Free PMC article.
Inference of Functionally-Relevant N-acetyltransferase Residues Based on Statistical Correlations.
Neuwald AF, Altschul SF. Neuwald AF, et al. PLoS Comput Biol. 2016 Dec 21;12(12):e1005294. doi: 10.1371/journal.pcbi.1005294. eCollection 2016 Dec. PLoS Comput Biol. 2016. PMID: 28002465 Free PMC article.

See all "Cited by" articles

References

1. Laskowski RA, Chistyakov VV, Thornton JM. PDBsum more: new summaries and analyses of the known 3D structures of proteins and nucleic acids. Nucleic Acids Res. 2005;33(Database issue):D266–D268. - PMC - PubMed
1. Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Buillard V, Cerutti L, Copley R, Courcelle E, Das U, Daugherty L, Dibley M, Finn R, Fleischmann W, Gough J, Haft D, Hulo N, Hunter S, Kahn D, Kanapin A, Kejariwal A, Labarga A, Langendijk-Genevaux PS, Lonsdale D, Lopez R, Letunic I, Madera M, Maslen J. New developments in the InterPro database. Nucleic Acids Res. 2007;35(Database issue):D224–228. - PMC - PubMed
1. de Juan D, Pazos F, Valencia A. Emerging methods in protein co-evolution. Nat Rev Genet. 2013;14(4):249–261. - PubMed
1. Marks DS, Hopf TA, Sander C. Protein structure prediction from sequence variation. Nat Biotechnol. 2012;30(11):1072–1080. doi: 10.1038/nbt.2419. - DOI - PMC - PubMed
1. Pei J, Grishin NV. AL2CO: calculation of positional conservation in a protein sequence alignment. Bioinformatics. 2001;17(8):700–712. doi: 10.1093/bioinformatics/17.8.700. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

H2rs: deducing evolutionary and functionally important residue positions by means of an entropy and similarity based analysis of multiple sequence alignments

Affiliation

H2rs: deducing evolutionary and functionally important residue positions by means of an entropy and similarity based analysis of multiple sequence alignments

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources