Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Dec;10(4):255-68.
doi: 10.1007/s10969-009-9071-1. Epub 2009 Oct 27.

Structural genomics target selection for the New York consortium on membrane protein structure

Affiliations

Structural genomics target selection for the New York consortium on membrane protein structure

Marco Punta et al. J Struct Funct Genomics. 2009 Dec.

Abstract

The New York Consortium on Membrane Protein Structure (NYCOMPS), a part of the Protein Structure Initiative (PSI) in the USA, has as its mission to establish a high-throughput pipeline for determination of novel integral membrane protein structures. Here we describe our current target selection protocol, which applies structural genomics approaches informed by the collective experience of our team of investigators. We first extract all annotated proteins from our reagent genomes, i.e. the 96 fully sequenced prokaryotic genomes from which we clone DNA. We filter this initial pool of sequences and obtain a list of valid targets. NYCOMPS defines valid targets as those that, among other features, have at least two predicted transmembrane helices, no predicted long disordered regions and, except for community nominated targets, no significant sequence similarity in the predicted transmembrane region to any known protein structure. Proteins that feed our experimental pipeline are selected by defining a protein seed and searching the set of all valid targets for proteins that are likely to have a transmembrane region structurally similar to that of the seed. We require sequence similarity aligning at least half of the predicted transmembrane region of seed and target. Seeds are selected according to their feasibility and/or biological interest, and they include both centrally selected targets and community nominated targets. As of December 2008, over 6,000 targets have been selected and are currently being processed by the experimental pipeline. We discuss how our target list may impact structural coverage of the membrane protein space.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Target selection protocol at NYCOMPS. a Building the NYCOMPS98 dataset of valid targets. We selected targets from 96 fully sequenced prokaryotic genomes. We used TMHMM2 [31] to predict TMHs in this set and retained only sequences with ≥2 TMHs. Finally, we applied a series of additional filters: we reduced redundancy at 98% using CD-HIT [34], we removed all sequences with 2 predicted TMHs for which the first TMH overlapped with a predicted signal peptide (using SignalP [35]) and we discarded sequences with at least 15 consecutive residues predicted to be disordered (using IUPred [36]). All sequences left constitute our set of valid targets, which we call NYCOMPS98. b Expanding a protein seed into a family of related proteins within NYCOMPS98. The seed is aligned against the whole NYCOMPS98 dataset using PSI-BLAST [39]. Retained sequences are those that satisfy our similarity criterion (Fig. 2). From this list we eliminate: sequences that are significantly similar to PDB proteins (filter is not applied to nominated targets), sequences known to constitute individual subunits of hetero-oligomeric complexes and sequences that differ significantly from the seed with respect to sequence length and number of predicted TMHs. We also discard proteins that align well with the family N-terminus consensus sequence (if any such consensus can be identified) but add some extra N-terminal residues, i.e. are possibly mis-annotated (Fig. S3). All remaining sequences are finally sent to cloning
Fig. 2
Fig. 2
αIMPs similarity criterion. We align sequence α to sequence β (both αIMPs) using PSI-BLAST [39]. If the alignment has E value <10−3 and it extends over ≥50% of the predicted TM regions of both proteins, than we consider β similar to α. This criterion is used throughout the paper to establish similarity between αIMPs, e.g. similarity between a seed protein and proteins in the NYCOMPS98 dataset
Fig. 3
Fig. 3
Diversity of NYCOMPS targets. a Distribution of sequence lengths. x-axis tick labels represent ranges, e.g. 100 means between 0 and 100 residues. The last bin (1,100) includes all proteins longer than 1,000 residues. b Distribution of number of TMHs predicted by TMHMM2 in all selected targets
Fig. 4
Fig. 4
Potential novel αIMP leverage provided by NYCOMPS targets. a The x-axis gives the number of seed families for which we hypothetically determine a structure (corresponding to 10 seed families or to 25–100% of all seed families; e.g. 25% corresponds to 43 seed families and 100% to 174 seed families); the y-axis reports the number of predicted αIMPs with more than 2 TMHs for which more than 50% of the residues in the TM region could be modeled using the NYCOMPS targets on the x-axis as templates (leverage). Numbers on the y-axis are for proteins in: UniProtKB-TMH (i.e. all predicted αIMPs in UniProtKB with more than 2 TMHs, see “Methods”; blank circles and continuous line), Swiss-Prot-TMH (filled diamonds and long-dash line) and UniProtKB-TMH-Human (i.e. human proteins in UniProtKB-TMH, crossed squares and short-dash line). Error bars are obtained by bootstrapping [48] (“Methods”). b Comparison between NYCOMPS target and PDB protein leverage. On the y-axis we report the ratio between the respective leverage values. Notations are as in (a). See “Methods” for the way UniProtKB-TMH leverage by PDB proteins is calculated

References

    1. Burley SK, Joachimiak A, Montelione GT, Wilson IA. Contributions to the NIH-NIGMS protein structure initiative from the PSI production centers. Structure. 2008;16:5–11. doi: 10.1016/j.str.2007.12.002. - DOI - PMC - PubMed
    1. Norvell JC, Berg JM. Update on the protein structure initiative. Structure. 2007;15:1519–1522. doi: 10.1016/j.str.2007.11.004. - DOI - PubMed
    1. Norvell JC, Machalek AZ. Structural genomics programs at the US national institute of general medical sciences. Nat Struct Biol. 2000;7 Suppl:931. doi: 10.1038/80694. - DOI - PubMed
    1. Stroud RM, Choe S, Holton J, Kaback HR, Kwiatkowski W, Minor DL, Riek R, Sali A, Stahlberg H, Harries W (2009) 2007 Annual progress report synopsis of the center for structures of membrane proteins. J Struct Funct Genomics 10:193–208 - PMC - PubMed
    1. Punta M, Forrest LR, Bigelow H, Kernytsky A, Liu J, Rost B. Membrane protein prediction methods. Methods. 2007;41:460–474. doi: 10.1016/j.ymeth.2006.07.026. - DOI - PMC - PubMed

Publication types

Substances