Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2006;34(22):6540-8.
doi: 10.1093/nar/gkl901. Epub 2006 Nov 27.

Sequence comparison by sequence harmony identifies subtype-specific functional sites

Affiliations
Comparative Study

Sequence comparison by sequence harmony identifies subtype-specific functional sites

Walter Pirovano et al. Nucleic Acids Res. 2006.

Abstract

Multiple sequence alignments are often used to reveal functionally important residues within a protein family. They can be particularly useful for the identification of key residues that determine functional differences between protein subfamilies. We present a new entropy-based method, Sequence Harmony (SH) that accurately detects subfamily-specific positions from a multiple sequence alignment. The SH algorithm implements a novel formula, able to score compositional differences between subfamilies, without imposing conservation, in a simple manner on an intuitive scale. We compare our method with the most important published methods, i.e. AMAS, TreeDet and SDP-pred, using three well-studied protein families: the receptor-binding domain (MH2) of the Smad family of transcription factors, the Ras-superfamily of small GTPases and the MIP-family of integral membrane transporters. We demonstrate that SH accurately selects known functional sites with higher coverage than the other methods for these test-cases. This shows that compositional differences between protein subfamilies provide sufficient basis for identification of functional sites. In addition, SH selects a number of sites of unknown function that could be interesting candidates for further experimental investigation.

PubMed Disclaimer

Figures

Figure 1
Figure 1
SH for AR-Smads versus BR-Smads along the sequence of the MH2 domain of Smads. 40 sites have SH zero. Most sites have SH one, meaning these have the same composition in each subgroup. Relatively few sites have intermediate SH values.
Figure 2
Figure 2
ROC plots for AR-Smads versus BR-Smads using the different prediction methods. For SH and SDP-pred the ranked results were used. One point for each unique rank value is drawn. Coverage is calculated as TP/(TP + FN), and Error as FP/(TN + FP). Note that the error rate is shown up to 20%. Note also that no method reaches a higher coverage for higher error rates.
Figure 3
Figure 3
ROC plots for (A) Rab 5/6, and (B) Ras/Ral-specific sites using SH, SDP-pred and TreeDet/MB (see text and Figure 2 caption for details). Validation of Rab5/6-specific sites was taken from Stenmark and co-workers (21,22), and for Ras/Ral specificity from Bauer et al. (6,20) and Del Sol Mesa et al. (6). Note that error rate is shown up to 60% and 23% for (A and B), respectively.
Figure 4
Figure 4
ROC plots for MIP specificity using SH, TreeDet/MB and SDP-pred (see text and Figure 2 caption for details). MIP subfamily-specific sites were selected based on a minimum distance of 5 Å from the glycerol molecules bound in the pore channel in the glycerol uptake protein crystal structure 1FX8 (24).
Figure 5
Figure 5
SH in a representative crystal structure for each of the test-sets. Non-harmonious sites (SH zero) are red and low harmony (SH ≤ 0.2) orange. Residue numbers for the low-harmony sites (SH ≤ 0.2) are indicated. (A) AR-Smads versus BR-Smads colour-coded onto the crystal structure of the MH2 domain of Smad2 (1KHX) (29) The spatial clustering of low-harmony sites is indicated with dotted ellipses, and clusters are labelled with corresponding known functions. Intermediate values go from white to light blue for maximum harmony (SH one). (B) SH for Rab5/6 using the crystal structure 5P21 and a representation and orientation similar to Figure 3a in Stenmark and co-workers (21,22). (C) id. for Ras/Ral using 5P21 and similar to Figure 4 in Del Sol Mesa et al. (6). (D) id. for MIP using 1KHX and similar to Figure 5 in Kalinina et al. (7).

Similar articles

Cited by

References

    1. Livingstone C.D., Barton G.J. Identification of functional residues and secondary structure from protein multiple sequence alignment. Methods Enzymol. 1996;266:497–512. - PubMed
    1. Lichtarge O., Bourne H.R., Cohen F.E. An evolutionary trace method defines binding surfaces common to protein families. J. Mol. Biol. 1996;257:342–358. - PubMed
    1. Kuipers W., Oliveira L., Vriend G., Ijzerman A.P. Identification of class-determining residues in G protein-coupled receptors by sequence analysis. Recept. Channels. 1997;5:159–174. - PubMed
    1. Hannenhalli S.S., Russell R.B. Analysis and prediction of functional sub-types from protein sequence alignments. J. Mol. Biol. 2000;303:61–76. - PubMed
    1. Mirny L.A., Gelfand M.S. Using orthologous and paralogous proteins to identify specificity-determining residues in bacterial transcription factors. J. Mol. Biol. 2002;321:7–20. - PubMed

Publication types