Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Nov;17(11):1214-1226.
doi: 10.1111/tra.12432. Epub 2016 Oct 9.

Using HHsearch to tackle proteins of unknown function: A pilot study with PH domains

Affiliations

Using HHsearch to tackle proteins of unknown function: A pilot study with PH domains

David R Fidler et al. Traffic. 2016 Nov.

Abstract

Advances in membrane cell biology are hampered by the relatively high proportion of proteins with no known function. Such proteins are largely or entirely devoid of structurally significant domain annotations. Structural bioinformaticians have developed profile-profile tools such as HHsearch (online version called HHpred), which can detect remote homologies that are missed by tools used to annotate databases. Here we have applied HHsearch to study a single structural fold in a single model organism as proof of principle. In the entire clan of protein domains sharing the pleckstrin homology domain fold in yeast, systematic application of HHsearch accurately identified known PH-like domains. It also predicted 16 new domains in 13 yeast proteins many of which are implicated in intracellular traffic. One of these was Vps13p, where we confirmed the functional importance of the predicted PH-like domain. Even though such predictions require considerable work to be corroborated, they are useful first steps. HHsearch should be applied more widely, particularly across entire proteomes of model organisms, to significantly improve database annotations.

Keywords: GRAM domains; Saccharomyces cerevisiae; TBC1D15; YJL016W; YJL181C; YJR030C; Gyp7p; Vid27p, Vps13p; pleckstrin homology (PH) domains; profile-profile search; secondary structure prediction; structural bioinformatics.

PubMed Disclaimer

Figures

Figure 1
Figure 1
PH‐like domains that share no significant sequence have highly similar folds. Ribbon diagrams of the Cα backbones of: A, classical PH domain in PEPP1 (1upq_A, residues 54–152); B, GRAM domain of MTMR2 (1lw3_A, residues 83–183); C, the 2 structures superimposed (root mean square distance = 1.9 Å across 86 residues) with views of both sides of the beta sandwich. Models colored by secondary structure: red = sheet, blue = helix; strong colors = classical PH, weak colors = GRAM. For more detail in A‐C, see Movies 1–3, respectively. Domains in pleckstrin were initially identified as a family of homologues ~100‐120 aa,29 and several structures were solved soon after.15 GRAM domains were identified as a family of sequences ~70 aa long,30 with the first solved structure identified as beta sheets 1–5 of a complete PH‐like domain.18
Figure 2
Figure 2
Family grouping of PH domains in PDB and yeast. A, ~240 PH‐like domains in PDB were divided into 39 families by PSI‐BLAST. The size of the colored shapes and extent of overlap correlates with numbers of domains. Family names are standard abbreviations, and PDB identifiers are listed in Table S1. Weak homologies, shown by dashed lines, were deduced from the presence of non‐significant hits (P > .005) that occurred with no false positive hits above them in hit lists. They lead to weak family grouping indicated by dashed outlines. Classical PH domains (~50% of all domains) are made up to 3 overlapping groups, and they have some overlap with some phosphotyrosine binding (PTB) domains. A high degree of overlap is seen within the 2 PTB groups and in the 2 Ran‐binding domain (RanBD) groups (EVH1, WH1). Twenty independent families contain 1, 2 or 3 domains with no links to larger families. B, The same domains and families were analyzed by HHsearch. Domains that produce hits to each other with prob[SS] ≥ 85% are grouped together. The 16 of 20 unlinked small families are now included in larger groups. PTB are fused with FERM‐C domains; also GRAM domains are fused with 9 other families. Some domains produce hits to both classical PH domains and either PTB/FERM‐C domains or GRAM domains, leading to partial fusion of these 3 groups. Dashed lines indicate incomplete overlap (prob[SH] = 50‐80% varying color grey to black). C, The families arranged according to the HHsearch analysis from (B) were populated with 73 yeast PH‐like domains described in the literature (details in Table S2A). Empty families are shown in faint outline.
Figure 3
Figure 3
Properties of PH‐like and non‐PH‐like hits. A, Specificity of HHsearch at different levels of prob[SS]. Hit lists with prob[SS] ≥ 5% from 39 PDB‐to‐PDB searches were merged and scanned for the occurrence of non‐PH‐like domains. At each level of prob[SS], the rate of non‐PH‐like hits was less than predicted from the prob[SS] metric. No non‐PH‐like domain scored prob[SS] > 85%. B, The relationship between prob[SS] (≥50%) and the number of aligned residues (COLs, see Box 3), both for true positives (showing means for every prob[SS] centile (total n = 2200, median 22 hits per centile) and for non‐PH‐like hits (showing 90 individual occurrences). In both groups COLs increased with prob[SS]. Although COLs was lower for false positives, there were some exceptions. Non‐PH‐like hits similarly tended to have lower secondary structural similarity scores (data not shown). Three strong false positives (black squares) are described in detail in Figure S1. Asterisk indicates the position of Age1p, a strong false positive in yeast (Figure S1).
Figure 4
Figure 4
New PH‐like domains identified in yeast. A, 16 strongly predicted new PH‐like domains in yeast, shown in the context of 9 full‐length proteins. Four paralogs that share highly similar domain patterns have been omitted. New domains (black outline) are shaded according prob[SS] of strongest hits (blue–red graded scale). Main section shows yeast‐to‐PDB hits; right‐hand section shows PDB‐to‐yeast hits. Where paralogs are reported in the same line, both prob[SS] values are reported, but the diagram and shading belong to the hit with higher prob[SS]. For the 1 domain predicted by indirect alignment (yellow outline, Vid27p‐1), prob[SS] values are given for both searches. Light grey boxes (Gyp7p, Vid27p Vps13p) indicate new PH‐like families present widely in eukaryotic evolution. Bud2p‐2, Pkh1/2p, Tph3p and Caf120p (domains with green surrounds) have homologues with PH‐like domains at the same position, so these new discoveries are to some extent expected. Accompanying domains include other, known PH‐like domains (in Lam1p/Sip3p and Spo71p ‐ shaded black) as well as C2, RasGAP, DH, TBC, BAR, StART, and others as follows: domain of unknown function = DUF; transmembrane domain = TMD, protein kinase = PKinase, chorein‐N domain in VPS13 = chorN, WD40 = WD, glycine‐rich = GLY. The N‐terminus of Rbh1p (and Rbh2p) contains a helical domain of unknown function with homology to RhoGEFs. B, One PH‐like domain tentatively identified with low prob[SH]. This was first found with PDB‐to‐yeast searches using increased secondary structural weighting, hence the usual prob[SS] value scale does not apply. Details of all newly predicted PH‐like domains are in Table S2B.
Figure 5
Figure 5
Intracellular targeting by the PH‐like domain of Vps13p. A, GFP‐Vps13‐PH‐PH (dimer) weakly targets the bud neck, seen as linear targeting across the neck of small‐to‐medium buds (filled arrowheads), and dots either side of occasional larger buds (hollow arrowheads). The minor nuclear enrichment is nonspecific, being seen with all other PH monomers and dimers (data not shown). B, Vps13p 3028–3144 as query (Q, top 3 lines) aligned with a target (T) hit from the solved structure 3hsa_A, a bacterial protein (Shewanella amazonensis, bottom 5 lines). The fourth line indicates which residues align, where: “|” is a very good alignment, “+” is good, “.” is neutral, “‐” is bad, and “=” is a clash. For both Q & T, the secondary structure prediction (ss prediction) is in 3 states, E for sheet (blue), H for helix (red) and C for unstructured loop (black), with prediction confidence shown for target. The target also has “ss_dssp” showing its solved structure. The box above shows statistics on the hit, including prob[SS] and COLs. L3125 and I3129 (highlighted in yellow) are partially conserved residues (lower case in consensus) that align with 3hsa_A ( “+” and “|,” respectively). Alignment made by HHalign. C, GFP‐tagged dimeric Vps13‐PH(LIAA) (L3125A and I3129A) accumulates in cells to a much lesser extent than wild‐type. Scale bars 5 µm.
Figure 6
Figure 6
Intracellular targeting by PH‐like domain of Vps13p. A‐F, Vps13‐EGFP constructs, A/B: wildtype (WT), C/D: L3125A I3129A (LIAA) and E/F: 1–3028 (ΔPH) in Δvps13 cells in log phase (A/C/E) or early stationary phase (B/D/F). Sites of localization include intracellular puncta (filled arrowheads) in log phase, and the nucleus vacuole junction (NVJ, open arrowheads) in stationary phase. Scale bar (shown in A only) 5 µm. G, Overlay blots to detect secreted CPY. Controls: wild‐type cells and Δvps13 cells show no secretion and maximal secretion, respectively. Rescue of Δvps13 was tested for Vps13‐EGFP plasmids as in A‐F. Results are representative of 6 similar experiments. H, Residue conservation in the C‐terminal 1095 residues of Vps13p, calculated by ConSurf and scaled as described in Materials and Methods. The most conserved part of the Vps13 C‐terminus is a glycine rich domain (orange) of no known function,72 not the predicted PH domain (green).

References

    1. Pena‐Castillo L, Hughes TR. Why are there still over 1000 uncharacterized yeast genes? Genetics. 2007;176:7‐14. - PMC - PubMed
    1. Brenner SE, Chothia C, Hubbard TJ. Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. Proc Natl Acad Sci USA. 1998;95:6073‐6078. - PMC - PubMed
    1. Altschul SF, Madden TL, Schaffer AA, et al. Gapped BLAST and PSI‐BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389‐3402. - PMC - PubMed
    1. Panchenko AR. Finding weak similarities between proteins by sequence profile comparison. Nucleic Acids Res. 2003;31:683‐689. - PMC - PubMed
    1. Yona G, Levitt M. Within the twilight zone: a sensitive profile‐profile comparison tool based on information theory. J Mol Biol. 2002;315:1257‐1275. - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources