Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Aug 9:11:420.
doi: 10.1186/1471-2105-11-420.

An improved classification of G-protein-coupled receptors using sequence-derived features

Affiliations

An improved classification of G-protein-coupled receptors using sequence-derived features

Zhen-Ling Peng et al. BMC Bioinformatics. .

Abstract

Background: G-protein-coupled receptors (GPCRs) play a key role in diverse physiological processes and are the targets of almost two-thirds of the marketed drugs. The 3 D structures of GPCRs are largely unavailable; however, a large number of GPCR primary sequences are known. To facilitate the identification and characterization of novel receptors, it is therefore very valuable to develop a computational method to accurately predict GPCRs from the protein primary sequences.

Results: We propose a new method called PCA-GPCR, to predict GPCRs using a comprehensive set of 1497 sequence-derived features. The principal component analysis is first employed to reduce the dimension of the feature space to 32. Then, the resulting 32-dimensional feature vectors are fed into a simple yet powerful classification algorithm, called intimate sorting, to predict GPCRs at five levels. The prediction at the first level determines whether a protein is a GPCR or a non-GPCR. If it is predicted to be a GPCR, then it will be further predicted into certain family, subfamily, sub-subfamily and subtype by the classifiers at the second, third, fourth, and fifth levels, respectively. To train the classifiers applied at five levels, a non-redundant dataset is carefully constructed, which contains 3178, 1589, 4772, 4924, and 2741 protein sequences at the respective levels. Jackknife tests on this training dataset show that the overall accuracies of PCA-GPCR at five levels (from the first to the fifth) can achieve up to 99.5%, 88.8%, 80.47%, 80.3%, and 92.34%, respectively. We further perform predictions on a dataset of 1238 GPCRs at the second level, and on another two datasets of 167 and 566 GPCRs respectively at the fourth level. The overall prediction accuracies of our method are consistently higher than those of the existing methods to be compared.

Conclusions: The comprehensive set of 1497 features is believed to be capable of capturing information about amino acid composition, sequence order as well as various physicochemical properties of proteins. Therefore, high accuracies are achieved when predicting GPCRs at all the five levels with our proposed method.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The hierarchical structure for GPCRs. The organization of GPCR sequences in the GPCRDB database does not include the first level in this figure. We add it in this study because we performed prediction at this level.
Figure 2
Figure 2
The structure of PCA-GPCR. For the name of the families, subfamilies, sub-subfamilies, and subtypes, please refer to the Additional file 1. The fourth and fifth levels are only applicable for some subfamilies and subtypes, which are also listed in the Additional file 1.
Figure 3
Figure 3
Selection of m. The overall prediction accuracies of GPCR families for the D365 dataset obtained by varying the number m of principle components. The highest overall accuracy is achieved when m = 32, which is marked by the dotted lines.
Figure 4
Figure 4
Contribution of features. The meanings of the notations AAC, DC, AD, GD, SD, and PseAAC can be found in Table 7. The divisions of these six subsets are marked by vertical blue lines.
Figure 5
Figure 5
Contribution of features in the AD subset. The divisions of these eight groups are marked by vertical blue lines. Among the groups P3, P5 and P6, their divisions are marked by vertical red lines.

Similar articles

Cited by

References

    1. Horn F, Weare J, Beukers MW, Hörsch S, Bairoch A, Chen W, Edvardsen Ø, Campagne F, Vriend G. GPCRDB: an information system for G protein-coupled receptors. Nucleic Acids Res. 1998;26:275–279. doi: 10.1093/nar/26.1.275. - DOI - PMC - PubMed
    1. Hébert TE, Bouvier M. Structural and functional aspects of G protein-coupled receptor oligomerization. Biochem Cell Biol. 1998;76:1–11. doi: 10.1139/bcb-76-1-1. - DOI - PubMed
    1. Ellis C. The state of GPCR research in 2004. Nat Rev Drug Discov. 2004;3:577–626. doi: 10.1038/nrd1458. - DOI - PubMed
    1. Palczewski K, Kumasaka T, Hori T, Behnke CA, Motoshima H, Fox BA, Le Trong I, Teller DC, Okada T, Stenkamp RE, Yamamoto M, Miyano M. Crystal structure of rhodopsin: a G-protein coupled receptor. Science. 2000;289:739–745. doi: 10.1126/science.289.5480.739. - DOI - PubMed
    1. Gaulton A, Attwood TK. Bioinformatics approaches for the classification of G-protein-coupled receptors. Curr Opin Pharmacol. 2003;3:114–120. doi: 10.1016/S1471-4892(03)00005-5. - DOI - PubMed

Publication types

MeSH terms

LinkOut - more resources