Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2014 Jan;35(1):11-23.
doi: 10.1038/aps.2013.142. Epub 2013 Nov 18.

Deorphanizing the human transmembrane genome: A landscape of uncharacterized membrane proteins

Affiliations
Review

Deorphanizing the human transmembrane genome: A landscape of uncharacterized membrane proteins

Joseph J Babcock et al. Acta Pharmacol Sin. 2014 Jan.

Abstract

The sequencing of the human genome has fueled the last decade of work to functionally characterize genome content. An important subset of genes encodes membrane proteins, which are the targets of many drugs. They reside in lipid bilayers, restricting their endogenous activity to a relatively specialized biochemical environment. Without a reference phenotype, the application of systematic screens to profile candidate membrane proteins is not immediately possible. Bioinformatics has begun to show its effectiveness in focusing the functional characterization of orphan proteins of a particular functional class, such as channels or receptors. Here we discuss integration of experimental and bioinformatics approaches for characterizing the orphan membrane proteome. By analyzing the human genome, a landscape reference for the human transmembrane genome is provided.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Deorphanization strategies. Left (Blue): In silico analyses of genomic sequences, topological prediction, and functional prediction. Right (Red): Phenotype of interest followed by genomic screen, bioinformatics evaluation of candidate list topology, and experimental validation.
Figure 2
Figure 2
Algorithms for topological and functional prediction. Primary amino acid sequence (top left) is employed to predict secondary structure topology motifs (transmembrane helices, cytosolic loops, signal peptides) (top right), while secondary descriptors describing composition or substitution patterns of amino acids (bottom) are used for functional prediction for membrane proteins.
Figure 3
Figure 3
Estimating the number of uncharacterized human membrane proteins. (A) Human RefSeq protein sequences are collapsed to unique genes. Three topology prediction algorithms are averaged to generate a list of predicted membrane proteins, and merged with membrane proteins derived from GenBank transmembrane helix annotations to yield a combined population of estimated membrane proteins. Previous functional annotations are evaluated using GeneRIF fields and Gene Ontology (GO) records, which are merged to yield a combined population of estimated uncharacterized proteins. The intersection of the membrane and uncharacterized populations represent uncharacterized membrane proteins. (B) Distribution of top GO molecular function (MF) categories for all membrane proteins.
Figure 4
Figure 4
Subcellular localization of predicted membrane and soluble proteins in S Cerevisiase. GFP fusions of individual yeast proteins have been expressed, localized and annotated. (A) Analytical pipeline for prediction of yeast membrane proteins beginning with 5909 RefSeq entries that are filtered and resulted in 4973 unique gene names. Topology algorithms for unique genes yield 920 putative membrane proteins and 4053 putative soluble proteins. Fractions of both groups possess experimentally determined subcellular locations. (B) The distribution of experimentally determined localization(s) for predicted membrane and soluble proteins in (A) among 22 cellular sites. Bar lengths are normalized to the total number of subcellular location sites available for predicted membrane and soluble proteins from (A).
Figure 5
Figure 5
Landscape of human membrane protein diversity. Two dimensional embedded coordinates are generated from vectors counting the number of each of the twenty amino acids in a whole protein sequence, transmembrane segments, and cytosolic segments for 4991 estimated human membrane proteins, using the t-stochastic neighbor embedding (t-SNE) algorithm. Colors represent groups identified by applying affinity propagation clustering to the embedded coordinates. (A) Embedded coordinates and cluster identity of subset of human membrane proteins with previous functional annotation. (B) As in (A), for uncharacterized membrane proteins. (C) As in (A), for TMEM proteins. (D) As in (A), for sequences with seven transmembrane segments as denoted by RefSeq annotations or averaged predictions of three topology algorithms.
Figure 6
Figure 6
Density profile of landscape of human membrane proteins. Embedded two-dimensional coordinates generated from vectors containing counts of each of the twenty amino acids in a whole protein sequence, membrane segments, and cytosolic segments, using the t-stochastic neighbor embedding (t-SNE) algorithm for 4991 estimated human membrane proteins. (A) Count per coordinate grid representing the number of sequences (colorbar) for the subset of human membrane proteins with previous annotation. (B) As (A), for uncharacterized membrane proteins.

References

    1. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. - PubMed
    1. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, et al. The sequence of the human genome. Science. 2001;291:1304–51. - PubMed
    1. Fagerberg L, Jonasson K, von Heijne G, Uhlen M, Berglund L. Prediction of the human membrane proteome. Proteomics. 2010;10:1141–9. - PubMed
    1. Bakheet TM, Doig AJ. Properties and identification of human protein drug targets. Bioinformatics. 2009;25:451–7. - PubMed
    1. Yildirim MA, Goh KI, Cusick ME, Barabasi AL, Vidal M. Drug-target network. Nat Biotechnol. 2007;25:1119–26. - PubMed

Publication types

LinkOut - more resources