Structural genomics plucks high-hanging membrane proteins

Edda Kloppmann¹, Marco Punta, Burkhard Rost

Affiliations

PMID: 22622032
PMCID: PMC3400333
DOI: 10.1016/j.sbi.2012.05.002

Review

Structural genomics plucks high-hanging membrane proteins

Edda Kloppmann et al. Curr Opin Struct Biol. 2012 Jun.

. 2012 Jun;22(3):326-32.

doi: 10.1016/j.sbi.2012.05.002. Epub 2012 May 21.

Authors

Edda Kloppmann¹, Marco Punta, Burkhard Rost

Affiliation

¹ Department of Bioinformatics and Computational Biology, Technical University Munich, Germany. kloppmann@rostlab.org

PMID: 22622032
PMCID: PMC3400333
DOI: 10.1016/j.sbi.2012.05.002

Abstract

Recent years have seen the establishment of structural genomics centers that explicitly target integral membrane proteins. Here, we review the advances in targeting these extremely high-hanging fruits of structural biology in high-throughput mode. We observe that the experimental determination of high-resolution structures of integral membrane proteins is increasingly successful both in terms of getting structures and of covering important protein families, for example, from Pfam. Structural genomics has begun to contribute significantly toward this progress. An important component of this contribution is the set up of robotic pipelines that generate a wealth of experimental data for membrane proteins. We argue that prediction methods for the identification of membrane regions and for the comparison of membrane proteins largely suffice to meet the challenges of target selection for structural genomics of membrane proteins. In contrast, we need better methods to prioritize the most promising members in a family of closely related proteins and to annotate protein function from sequence and structure in absence of homology.

PubMed Disclaimer

Figures

**Figure 1. Novel Pfam families covered by SG and non-SG alpha-IMP structures**
We consider all Pfam families that have at least one structural representative in the PDB. Non-structural genomics (non-SG) and structural genomics (SG) contributions are shown in green and blue, respectively. We considered 1,035 PDB IDs for polytopic alpha-IMP structures. These are all IMP structures found in the OPM (Orientations of Proteins in Membranes, [47]), or PDBTM (Protein Data Bank of Transmembrane Proteins, [7]) databases, that have at least one representative in UniProt [48] and can be mapped to at least one Pfam family (Release 26.0, [17]). According to our definition, an IMP maps to a Pfam family when at least 12 transmembrane residues (annotation from PDBTM) align to the profile hidden Markov model of the family (using alignment coordinates). We found 107 such Pfam families. From this list, we excluded 6 Pfam families. These include two families that represent N- or C-terminal soluble extensions of a transmembrane domain, one case of a dubious Pfam match, one case where the classification, but not the annotation in OPM is wrong, one case where the annotation in PDBTM is wrong and one case where we considered a bitopic IMP chain of an IMP structure with one polytopic and two bitopic chains. Therefore, here we consider 101 “IMP” Pfam families that align to at least one structure of a polytopic IMP. Note: 159 IMPs of the initial set that could not be mapped to either a UniProt sequence or a Pfam family have been excluded from the analysis. Note also: the Pfam family ‘formate/nitrite transporter’ (PF01226) was covered by one SG and one non-SG structure in December 2009. Both structures were deposited in the PDB before the coordinates of the other were released, i.e. became publicly known. We counted PF01226 as half a family for both SG and non-SG.

**Figure 2. Taxonomic distribution of IMP structures covering novel Pfam families**
We show the number of Pfam families covered by structures from Eukaryotes, Bacteria, Archaea and Viruses (in green, blue, light blue and red, respectively). The data is shown for three different time spans. For each combination of family and kingdom we consider the release date of the first structure solved for this family. For example, a family with several bacterial protein structures is counted in the time range during which the first structure was solved. On the other hand, Pfam families with protein structures from more than one kingdom are counted for each kingdom. For example, a Pfam family with a eukaryotic and bacterial protein structure is counted twice, i.e. once for each kingdom. 17 of the 101 Pfam families are counted for two kingdoms and 8 families have eukaryotic, bacterial and archaeal protein structures. Mapping from PDB to Pfam as described in Figure 1.

**Figure 3. Human IMPs: Pfam families and PDB structures**
**A: Mapping human IMPs to Pfam families.** 3,305 polytopic alpha-IMPs were extracted from the 20,247 sequences part of the SwissProt *Homo sapiens* proteome (UniProt release Feb 22, 2012) using PolyPhobius [49]. Assignment of proteins to Pfam families was done as described in Figure 1 using the transmembrane assignment of PolyPhobius. 3,063 IMPs can be mapped to a Pfam family (orange); 242 IMPs fall outside of the current Pfam collection of families (red). **B: Human IMP Pfam families covered by structure.** We show human IMP Pfam families with no structural representative (green) and with at least one structural representative (blue: representative is a human protein; light blue: representative is not a human protein).

**Figure 4. The TehA protein family. A: Homologous proteins behave differentially**
The NYCOMPS SG center cloned 35 homologous prokaryotic proteins belonging to the TehA family. Of the 35 homologous proteins experimentally pursued, 33 could be cloned, 8 and 5 expressed (small and large scale, respectively), and only one yielded a diffracting crystal and finally a high-resolution structure. Note in particular the dramatic attrition rate in the number of successfully cloned to successfully expressed proteins. Targets are cloned by ligation free cloning and C-terminal fusion expression vector. Expression and purification are assessed by Coomassie Blue stained SDS–PAGE gels and stability in the DMM detergent is determined by size exclusion chromatography [35]. Structures of the TehA family representative from *H. influenzae* have been solved and deposited in the PDB [24]. **B: Structure of the SLAC1 homolog TehA.** The anion channel structure is shown as seen from the periplasm (PDB ID: 3m71, ribbon representation). The highly conserved Phe262 occluding the ion permeation pathway is shown explicitly. The figure is created using Chimera [50].

**Figure 5. Statistics for IMP structural genomics protein production pipelines**
Depicted is the number of IMPs that were successful at different stages in the experimental pipelines. Data were extracted from TargetDB [51] in January 2012 for nine membrane protein structural genomics consortia: CSMP, GPCR network, MPID, MPSbyNMR, MPSCB, NYCOMPS, TEMIPS, TMPC and TransportPDB. For the NMR consortium (MPSbyNMR) we do not report data for the steps following purification.

See this image and copyright information in PMC

References

1. von Heijne G. The membrane protein universe: what’s out there and why bother? J Intern Med. 2007;261:543–557. - PubMed
1. von Heijne G. Membrane-protein topology. Nat Rev Mol Cell Biol. 2006;7:909–918. - PubMed
1. Liu J, Rost B. Comparing function and structure between entire proteomes. Protein Sci. 2001;10:1970–1979. - PMC - PubMed
1. Fagerberg L, Jonasson K, von Heijne G, Uhlen M, Berglund L. Prediction of the human membrane proteome. Proteomics. 2010;10:1141–1149. - PubMed
1. Overington JP, Al-Lazikani B, Hopkins AL. How many drug targets are there? Nat Rev Drug Discov. 2006;5:993–996. - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Structural genomics plucks high-hanging membrane proteins

Affiliation

Structural genomics plucks high-hanging membrane proteins

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources