Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2005 Aug;1(3):e26.
doi: 10.1371/journal.pcbi.0010026. Epub 2005 Aug 12.

Comparative genomics and disorder prediction identify biologically relevant SH3 protein interactions

Affiliations
Comparative Study

Comparative genomics and disorder prediction identify biologically relevant SH3 protein interactions

Pedro Beltrao et al. PLoS Comput Biol. 2005 Aug.

Abstract

Protein interaction networks are an important part of the post-genomic effort to integrate a part-list view of the cell into system-level understanding. Using a set of 11 yeast genomes we show that combining comparative genomics and secondary structure information greatly increases consensus-based prediction of SH3 targets. Benchmarking of our method against positive and negative standards gave 83% accuracy with 26% coverage. The concept of an optimal divergence time for effective comparative genomics studies was analyzed, demonstrating that genomes of species that diverged very recently from Saccharomyces cerevisiae(S. mikatae, S. bayanus, and S. paradoxus), or a long time ago (Neurospora crassa and Schizosaccharomyces pombe), contain less information for accurate prediction of SH3 targets than species within the optimal divergence time proposed. We also show here that intrinsically disordered SH3 domain targets are more probable sites of interaction than equivalent sites within ordered regions. Our findings highlight several novel S. cerevisiae SH3 protein interactions, the value of selection of optimal divergence times in comparative genomics studies, and the importance of intrinsic disorder for protein interactions. Based on our results we propose novel roles for the S. cerevisiae proteins Abp1p in endocytosis and Hse1p in endosome protein sorting.

PubMed Disclaimer

Conflict of interest statement

Competing interests. The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Conservation Study of the SH3 Domains of S. cerevisiae in Ten Other Yeast Genomes
CD, conserved domain (the SH3-containing protein has an ortholog and the ortholog SH3 domain is possibly conserved, i.e., less than three conservative changes and no nonconservative changes in the binding positions); DD, divergent domain (SH3-containing protein has an ortholog in this genome but the domain is not on the same branch of the phylogenetic tree); NO, no ortholog (no ortholog found for SH3-containing protein in a particular genome); PD, possibly divergent (SH3-containing protein has an ortholog in this genome but the ortholog SH3 domain has at least one nonconservative change in the binding positions or more than two conservative changes in the binding positions).
Figure 2
Figure 2. Size of Probing Window When Looking for Conservation of the Consensus Sequence in Orthologs of the Putative Target Protein
We defined the conservation score as simply the number of species where the consensus sequence is conserved. With this information the accuracy and coverage were calculated, with the gold (A) and platinum (B) positive sets, for consensus sequence conserved in different numbers of species and for different sizes of the probing region.
Figure 3
Figure 3. Combining Conservation and Secondary Structure Prediction
We calculated, with the gold (A) and platinum (B) positive sets, the accuracy and coverage for target prediction when including or excluding secondary structure information. We used a probing region of 210 alignment positions in this analysis.
Figure 4
Figure 4. Optimal Divergence Time to Search for Conservation of Target Motif of SH3 Domains
We designated seven groups of species with an increasing average divergence time from S. cerevisiae and calculated for each group the highest accuracy obtained for restricted windows of coverage. We used the gold positive and the negative set to calculate the accuracy and coverage (see Materials and Methods). The seven groups of species are as follows: (1) S. bayanus, S. paradoxus, S. mikatae, and C. glabrata (average divergence of 112.5 My from S. cerevisiae); (2) S. paradoxus, S. mikatae, C. glabrata, and K. lactis (average divergence of 200 My from S. cerevisiae); (3) S. mikatae, C. glabrata, K. lactis, and C. albicans (average divergence of 387.5 My from S. cerevisiae); (4) C. glabrata, K. lactis, C. albicans, and D. hansenii (average divergence of 575 My from S. cerevisiae); (5) K. lactis, C. albicans, D. hansenii, and Y. lipolytica (average divergence of 725 My from S. cerevisiae); (6) C. albicans, D. hansenii, Y. lipolytica, and N. crassa (average divergence of 875 My from S. cerevisiae); and (7) D. hansenii, Y. lipolytica, N. crassa, and Sch. pombe (average divergence of 950 My from S. cerevisiae). The individual values for the divergence time from S. cerevisiae were taken from the literature [32,42,43]. Although we tried to create groups that would not have genomes of species with very different separation dates from S. cerevisiae, it should be noted that because of the small number of available genomes, the groups are not homogenous. Also, the values of the divergence time of each species were not always obtained with the same method. Therefore, this range of values should be viewed critically.
Figure 5
Figure 5. Most Informative Genomes in the Search for Conservation of Target Motif of SH3 Domains
We created all possible combinations of two or more genomes of our set of ten genomes. For each combination we calculated the highest accuracy obtained for 11 windows of coverage from 15% to 70% at intervals of 5%. We then calculated the average frequency, over all coverage windows, of each individual species in all groups of genomes, in the combinations of genomes scoring within the 20% highest accuracy values and in the combinations scoring in the lowest 20% values of accuracy. We then used a t-test to determine, for each species, whether the average frequencies within the highest and lowest combinations were significantly different from the frequency in all possible combinations. *, p < 0.05; **, p < 0.001.
Figure 6
Figure 6. Predictions of S. cerevisiae SH3 Interactions
We considered that a potential target consensus sequence, found by pattern matching, in an S. cerevisiae protein would be biologically relevant if it was within an unstructured region of the S. cerevisiae protein and also conserved in four of the seven comparison genomes used. (C. glabrata, K. lactis, C. albicans, D. hansenii, Y. lipolytica, N. crassa, and Sch. pombe). Red lines indicate the interactions for which we found some experimental evidence in protein interaction databases [–61]; thin black lines indicate interactions between proteins that are labeled as locating to different compartments; thick black lines indicate interactions for which we found no evidence. There were two S. cerevisiae SH3 domains for which we could not predict any interaction because of the stringency applied. A complete list of the interactions with function, localization, and binding positions is given in Table S4.

References

    1. Enright AJ, Ouzounis CA. Functional associations of proteins in entire genomes by means of exhaustive detection of gene fusions. Genome Biol. 2001;2:RESEARCH0034. - PMC - PubMed
    1. Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, et al. Detecting protein function and protein–protein interactions from genome sequences. Science. 1999;285:751–753. - PubMed
    1. Dandekar T, Snel B, Huynen M, Bork P. Conservation of gene order: A fingerprint of proteins that physically interact. Trends Biochem Sci. 1998;23:324–328. - PubMed
    1. Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO. Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles. Proc Natl Acad Sci U S A. 1999;96:4285–4288. - PMC - PubMed
    1. Gaasterland T, Ragan MA. Microbial genescapes: phyletic and functional patterns of ORF distribution among prokaryotes. Microb Comp Genomics. 1998;3:199–217. - PubMed

Publication types

LinkOut - more resources