. 2006 Mar;15(3):509-21.

doi: 10.1110/ps.051745906. Epub 2006 Feb 1.

A general model of G protein-coupled receptor sequences and its application to detect remote homologs

Markus Wistrand¹, Lukas Käll, Erik L L Sonnhammer

Affiliations

PMID: 16452613
PMCID: PMC2249772
DOI: 10.1110/ps.051745906

A general model of G protein-coupled receptor sequences and its application to detect remote homologs

Markus Wistrand et al. Protein Sci. 2006 Mar.

. 2006 Mar;15(3):509-21.

doi: 10.1110/ps.051745906. Epub 2006 Feb 1.

Authors

Markus Wistrand¹, Lukas Käll, Erik L L Sonnhammer

Affiliation

¹ Center for Genomics and Bioinformatics, Karolinska Institutet, S-17177 Stockholm, Sweden.

PMID: 16452613
PMCID: PMC2249772
DOI: 10.1110/ps.051745906

Abstract

G protein-coupled receptors (GPCRs) constitute a large superfamily involved in various types of signal transduction pathways triggered by hormones, odorants, peptides, proteins, and other types of ligands. The superfamily is so diverse that many members lack sequence similarity, although they all span the cell membrane seven times with an extracellular N and a cytosolic C terminus. We analyzed a divergent set of GPCRs and found distinct loop length patterns and differences in amino acid composition between cytosolic loops, extracellular loops, and membrane regions. We configured GPCRHMM, a hidden Markov model, to fit those features and trained it on a large dataset representing the entire superfamily. GPCRHMM was benchmarked to profile HMMs and generic transmembrane detectors on sets of known GPCRs and non-GPCRs. In a cross-validation procedure, profile HMMs produced an error rate nearly twice as high as GPCRHMM. In a sensitivity-selectivity test, GPCRHMM's sensitivity was about 15% higher than that of the best transmembrane predictors, at comparable false positive rates. We used GPCRHMM to search for novel members of the GPCR superfamily in five proteomes. All in all we detected 120 sequences that lacked annotation and are potentially novel GPCRs. Out of those 102 were found in Caenorhabditis elegans, four in human, and seven in mouse. Many predictions (65) belonged to Pfam domains of unknown function. GPCRHMM strongly rejected a family of arthropod-specific odorant receptors believed to be GPCRs. A detailed analysis showed that these sequences are indeed very different from other GPCRs. GPCRHMM is available at http://gpcrhmm.cgb.ki.se.

PubMed Disclaimer

Figures

**Figure 1.**
Similarity-based relationship tree of 13 confirmed or putative GPCR families and two families that are known not to be GPCRs: the bacteriorhodopsin (a proton pump) and the protein kinase families. Distances between families were obtained as follows. The Pfam HMM (“glocal” model) representing each of the family was used to score the full-length Pfam sequences of all other families. The logarithm of the lowest of the two median E-values from each reciprocal search was used as distance measure in the UPGMA algorithm. The database size for the HMM searches was set to 10⁶ sequences. To avoid negative distances, a constant was added to all values in the distance matrix but this was compensated for on the X-axis scale. The tree places bacteriorhodopsin between the confirmed GPCR families and Mlo and 7tm_6, suggesting that the latter are not GPCRs.

**Figure 2.**
Loop length distributions of the training set sequences (bars) and modeled length distributions (dots). The observed lengths: most notable is the conserved and short length of the first cytosolic loop. Also, the second cytosolic loop has a narrow length distribution. In contrast, the first extracellular loop includes a number of long examples. The second extracellular and the third cytosolic loops have wide length distributions and long median lengths. The third extracellular loop is often short but has a wide length distribution. Modeled length distributions: the data was fitted to binomial distributions (cytosolic loop 1 and 2) or to negative binomial distributions (the remaining loops). The estimated distributions follow the observed data reasonably well given the trade-off between modeling quality and the risk of overtraining on imperfect data.

**Figure 3.**
An amino acid composition-based relationship tree of the different topological regions in the training set GPCRs. A distance measure based on relative entropy was used (see Materials and Methods). Terminology: the numbering is from the N terminus to the C terminus. “1-extracellular” is the first extracellular loop, “1-cytosolic” is the first cytosolic loop, and so forth. “N/C-terminal near” corresponds to the 15 residues closest to the membrane in the N/C-terminal soluble regions, while “N/C-terminal glob” represents the remaining residues to the respective termini.

**Figure 4.**
(A) Overview of the GPCRHMM architecture. A box where the possible length interval is indicated represents each model compartment. To model different types of sequence lengths data we have used three sets of connectivity layouts that correspond to different distributions. See Materials and Methods for a description of the signal peptide (SP) compartment. (B) In this connectivity layout the emitting states are accompanied by “silent” states that do not emit amino acids. This generates a distribution with a limited maximum length, and was used to model the first and second cytosolic loops. (C) Here, the states have a self-transition and a transition to the next state. All self-transitions are given the same probability. This generates a length distribution with unlimited maximum length, which was used for other remaining loops. The notation x + y → ∞ means that the compartment has a fixed length region of x states followed by a region of y states allowing lengths of y → ∞. (D) This layout of forward connected emitting states was used to model the core of a TM helix.

**Figure 5.**
Large-scale testing of GPCRHMM. Shown is a histogram of GPCRHMM scores for GPCRDB (redundant sequences removed) and a large negative dataset (the Swiss-Prot database minus all sequences with >20% sequence identity to any protein in GPCRDB). Some high-scoring false positives occur, and to address this a local scoring procedure was devised (see Fig. 6). The majority of low scoring GPCRDB sequences are fly odorant receptors (7tm_6).

**Figure 6.**
GPCRHMM’s discrimination can be improved by applying a “local score.” Global and local scores are plotted for the sequences in GPCRDB and a large negative dataset as in Figure 5. The sequences from Figure 5 with a global score above 0 were rescored using a devised local score (see Materials and Methods). This improves the separation between true and false hits. We noted that a number of the high scoring negative sequences were actually putative GPCRs not part of GPCRDB (e.g., serpentine receptors). GPCRHMM’s default cutoffs are global score >0 and local score >0.

See this image and copyright information in PMC

Cited by

Prediction and expression analysis of G protein-coupled receptors in the laboratory stick insect, Carausius morosus.
Duan Şahbaz B, Birgül Iyison N. Duan Şahbaz B, et al. Turk J Biol. 2019 Feb 7;43(1):77-88. doi: 10.3906/biy-1809-27. eCollection 2019. Turk J Biol. 2019. PMID: 30930638 Free PMC article.
Whole proteome identification of plant candidate G-protein coupled receptors in Arabidopsis, rice, and poplar: computational prediction and in-vivo protein coupling.
Gookin TE, Kim J, Assmann SM. Gookin TE, et al. Genome Biol. 2008;9(7):R120. doi: 10.1186/gb-2008-9-7-r120. Epub 2008 Jul 31. Genome Biol. 2008. PMID: 18671868 Free PMC article.
No Evidence for Ionotropic Pheromone Transduction in the Hawkmoth Manduca sexta.
Nolte A, Gawalek P, Koerte S, Wei H, Schumann R, Werckenthin A, Krieger J, Stengl M. Nolte A, et al. PLoS One. 2016 Nov 9;11(11):e0166060. doi: 10.1371/journal.pone.0166060. eCollection 2016. PLoS One. 2016. PMID: 27829053 Free PMC article.
Controversy and consensus: noncanonical signaling mechanisms in the insect olfactory system.
Nakagawa T, Vosshall LB. Nakagawa T, et al. Curr Opin Neurobiol. 2009 Jun;19(3):284-92. doi: 10.1016/j.conb.2009.07.015. Epub 2009 Aug 5. Curr Opin Neurobiol. 2009. PMID: 19660933 Free PMC article. Review.
Expression and evolutionary divergence of the non-conventional olfactory receptor in four species of fig wasp associated with one species of fig.
Lu B, Wang N, Xiao J, Xu Y, Murphy RW, Huang D. Lu B, et al. BMC Evol Biol. 2009 Feb 20;9:43. doi: 10.1186/1471-2148-9-43. BMC Evol Biol. 2009. PMID: 19232102 Free PMC article.

See all "Cited by" articles

References

1. Bateman, A., Coin, L., Durbin, R., Finn, R.D., Hollich, V., Griffiths-Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E.L., et al. 2004. The Pfam protein families database. Nucleic Acids Res. 32: D138–D141. - PMC - PubMed
1. Bockaert, J. and Pin, J.P. 1999. Molecular tinkering of G protein-coupled receptors: An evolutionary success. EMBO J. 18: 1723–1729. - PMC - PubMed
1. Burge, C. and Karlin, S. 1997. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268: 78–94. - PubMed
1. Clyne, P.J., Warr, C.G., Freeman, M.R., Lessing, D., Kim, J., and Carlson, J.R. 1999. A novel family of divergent seven-transmembrane proteins: Candidate odorant receptors in Drosophila. Neuron 22: 327–338. - PubMed
1. Clyne, P.J., Warr, C.G., and Carlson, J.R. 2000. Candidate taste receptors in Drosophila. Science 287: 1830–1834. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A general model of G protein-coupled receptor sequences and its application to detect remote homologs

Affiliation

A general model of G protein-coupled receptor sequences and its application to detect remote homologs

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases

Research Materials

Miscellaneous

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases

Research Materials

Miscellaneous