Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Mar 21;273(1):236-47.
doi: 10.1016/j.jtbi.2010.12.024. Epub 2010 Dec 17.

Some remarks on protein attribute prediction and pseudo amino acid composition

Affiliations

Some remarks on protein attribute prediction and pseudo amino acid composition

Kuo-Chen Chou. J Theor Biol. .

Abstract

With the accomplishment of human genome sequencing, the number of sequence-known proteins has increased explosively. In contrast, the pace is much slower in determining their biological attributes. As a consequence, the gap between sequence-known proteins and attribute-known proteins has become increasingly large. The unbalanced situation, which has critically limited our ability to timely utilize the newly discovered proteins for basic research and drug development, has called for developing computational methods or high-throughput automated tools for fast and reliably identifying various attributes of uncharacterized proteins based on their sequence information alone. Actually, during the last two decades or so, many methods in this regard have been established in hope to bridge such a gap. In the course of developing these methods, the following things were often needed to consider: (1) benchmark dataset construction, (2) protein sample formulation, (3) operating algorithm (or engine), (4) anticipated accuracy, and (5) web-server establishment. In this review, we are to discuss each of the five procedures, with a special focus on the introduction of pseudo amino acid composition (PseAAC), its different modes and applications as well as its recent development, particularly in how to use the general formulation of PseAAC to reflect the core and essential features that are deeply hidden in complicated protein sequences.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Illustration to show the four categories of protein structural class: (a) all-α, (b) all-β, (c) α/β, and (d) α+β, where the α-helix is colored in red, β-strand in yellow, and the other in green. The PDB codes used to draw the representatives of the four structural classes are 1aep, 1gbg, 1enp, and 1aak, respectively. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 2
Fig. 2
Illustration to show the seven categories of protein structural class: (a) all-α, (b) all-β, (c) α/β, (d) α+β, (e) μ (multi-domain), (f) σ (small protein), and (g) ρ (peptide), where the α-helix is colored in red, β-strand in yellow, and the other in green. The PDB codes used to draw the representatives of the seven structural classes are 1a6m, 1uzv, 2f62, 2bf5, 1vqq, 4hir, and 1ter, respectively. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 3
Fig. 3
Schematic drawings to show the eight categories of membrane protein types: (1) type I transmembrane, (2) type II, (3) type III, (4) type IV, (5) multipass transmembrane, (6) lipid-chain-anchored membrane, (7) GPI-anchored membrane, and (8) peripheral membrane. As shown in the figure, types I, II, III, and IV are all of single-pass transmembrane proteins; see Spiess (1995) for a detailed description about their difference.
Fig. 4
Fig. 4
Schematic illustration to show the 22 subcellular locations of eukaryotic proteins: (1) acrosome, (2) cell wall, (3) centriole, (4) chloroplast, (5) cyanelle, (6) cytoplasm, (7) cytoskeleton, (8) endoplasmic reticulum, (9) endosome, (10) extracellular, (11) Golgi apparatus, (12) hydrogenosome, (13) lysosome, (14) melanosome, (15) microsome (16) mitochondria, (17) nucleus, (18) peroxisome, (19) plasma membrane, (20) plastid, (21) spindle pole body, and (22) vacuole.
Fig. 5
Fig. 5
Illustration to show how the KNN classifier depends on the selection of parameter K in identifying the attribute category of a query protein, where the query protein P is represented by the character q with a filled circle, proteins belonging to subset S1 (category 1) are represented by the open circle with number 1, proteins of S2 by the open circle with number 2, and so forth. When K=1, the query protein is predicted belonging to category 2 as its nearest protein does; when K=3, the query protein is predicted belonging to category 3 because two of its three nearest proteins belong to that category; when K=9, the query protein is predicted belonging to category 2 again because the majority of its nine nearest proteins belong to category 2.

References

    1. Altschul S.F. Evaluating the statistical significance of multiple distinct local alignments. In: Suhai S., editor. Theoretical and Computational Methods in Genome Research. Plenum; New York: 1997. pp. 1–14.
    1. Anand A., Suganthan P.N. Multiclass cancer classification by support vector machines with class-wise optimized genes and probability estimates. J. Theor. Biol. 2009;259:533–540. - PubMed
    1. Andraos J. Kinetic plasticity and the determination of product ratios for kinetic schemes leading to multiple products without rate laws: new methods based on directed graphs. Can. J. Chem. 2008;86:342–357.
    1. Ashburner M., Ball C.A., Blake J.A., Botstein D., Butler H., Cherry J.M., Davis A.P., Dolinski K., Dwight S.S., Eppig J.T., Harris M.A., Hill D.P., Issel-Tarver L., Kasarskis A., Lewis S., Matese J.C., Richardson J.E., Ringwald M., Rubin G.M., Sherlock G. Gene ontology: tool for the unification of biology. Nat. Genet. 2000;25:25–29. - PMC - PubMed
    1. Cai Y.D., Chou K.C. Nearest neighbour algorithm for predicting protein subcellular location by combining functional domain composition and pseudo-amino acid composition. Biochem. Biophys. Res. Commun. 2003;305:407–411. - PubMed