Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011;6(9):e24382.
doi: 10.1371/journal.pone.0024382. Epub 2011 Sep 12.

Determinants, discriminants, conserved residues--a heuristic approach to detection of functional divergence in protein families

Affiliations

Determinants, discriminants, conserved residues--a heuristic approach to detection of functional divergence in protein families

Kavitha Bharatham et al. PLoS One. 2011.

Abstract

In this work, belonging to the field of comparative analysis of protein sequences, we focus on detection of functional specialization on the residue level. As the input, we take a set of sequences divided into groups of orthologues, each group known to be responsible for a different function. This provides two independent pieces of information: within group conservation and overlap in amino acid type across groups. We build our discussion around the set of scoring functions that keep the two separated and the source of the signal easy to trace back to its source.We propose a heuristic description of functional divergence that includes residue type exchangeability, both in the conservation and in the overlap measure, and does not make any assumptions on the rate of evolution in the groups other than the one under consideration. Residue types acceptable at a certain position within an orthologous group are described as a distribution which evolves in time, starting from a single ancestral type, and is subject to constraints that can be inferred only indirectly. To estimate the strength of the constraints, we compare the observed degrees of conservation and overlap with those expected in the hypothetical case of a freely evolving distribution.Our description matches the experiment well, but we also conclude that any attempt to capture the evolutionary behavior of specificity determining residues in terms of a scalar function will be tentative, because no single model can cover the variety of evolutionary behavior such residues exhibit. Especially, models expecting the same type of evolutionary behavior across functionally divergent groups tend to miss a portion of information otherwise retrievable by the conservation and overlap measures they use.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. The main components of the information available from comparative analysis of two groups of paralogous sequences.
The nomenclature we use in this paper for the three main types of behavior is also indicated.
Figure 2
Figure 2. ROC curves for small molecule binding cases.
y-axis: true positive rate - fraction of experimentally determined specific resides above threshold. x-axis: non-positive rate - fraction of residues not tested in the experiment. The residues are ordered according to a specificity scoring method. Moving the threshold down the list determines the values plotted int the graph. Inset: x-axis: true positive rate - fraction of experimentally determined specific resides above threshold. x-axis: false positive rate - fraction of residues determined experimentally to be non-specific. The methods tested are indicated in the figure legend. For each family, panel caption lists the families considered (contrasted) in the analysis, taxonomical breadth of source organisms, number of sequences in each group, function tested in the experiment, as well as the method of its inference. The resulting number of true positives (specificity determinants), true negatives, and the length of the target sequence are also listed.
Figure 3
Figure 3. Combining various conservation and overlap scores into a single specificity scoring function for the LacI case.
Method identifiers (see Methoda section and also Text S1): the first character: e: entropy, r: entropy modified by its expected value, j: Jensen-Shannon divergence from the stationary distribution, 0: no conservation score used. The second character: o: overlap of normalized distributions, f: squared difference, r: o modified by the expected value, m: pairwise mutual information. The third character: e: Euclidean distance, l: linear. Red: determinant model, green: discriminant. Pink: GroupSim, blue: mutual information. GroupSim uses conservation of neighboring residues as additional criterion. y-axis: area under the ROC curve for each method.
Figure 4
Figure 4. The same as Fig. 1, for protein-protein interaction cases.

Similar articles

Cited by

References

    1. Valdar W. Scoring residue conservation. PROTEINS-NEW YORK- 2002;48:227–241. - PubMed
    1. Wu T, Kabat E. An analysis of the sequences of the variable regions of Bence Jones proteins and myeloma light chains and their implications for antibody complementarity. The Journal of Experimental Medicine. 1970;132:211. - PMC - PubMed
    1. Shannon C, Weaver W. Urbana, Illinois: The University of Illinois Press; 1949. The Mathematical Theory of Communication.
    1. Schneider T, Stormo G, Gold L, Ehrenfeucht A. Information content of binding sites on nucleotide sequences. Journal of Molecular Biology. 1986;188:415–431. - PubMed
    1. Shenkin P, Erman B, Mastrandrea L. Proteins: Struct., Fund. Genetics. 1991;11:297. - PubMed

Publication types