Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2013 Dec 16:4:286.
doi: 10.3389/fgene.2013.00286.

Small proteins: untapped area of potential biological importance

Affiliations
Review

Small proteins: untapped area of potential biological importance

Mingming Su et al. Front Genet. .

Abstract

Polypeptides containing ≤100 amino acid residues (AAs) are generally considered to be small proteins (SPs). Many studies have shown that some SPs are involved in important biological processes, including cell signaling, metabolism, and growth. SP generally has a simple domain and has an advantage to be used as model system to overcome folding speed limits in protein folding simulation and drug design. But SPs were once thought to be trivial molecules in biological processes compared to large proteins. Because of the constraints of experimental methods and bioinformatics analysis, many genome projects have used a length threshold of 100 amino acid residues to minimize erroneous predictions and SPs are relatively under-represented in earlier studies. The general protein discovery methods have potential problems to predict and validate SPs, and very few effective tools and algorithms were developed specially for SPs identification. In this review, we mainly consider the diverse strategies applied to SPs prediction and discuss the challenge for differentiate SP coding genes from artifacts. We also summarize current large-scale discovery of SPs in species at the genome level. In addition, we present an overview of SPs with regard to biological significance, structural application, and evolution characterization in an effort to gain insight into the significance of SPs.

Keywords: evolution characterization; protein annotation coherence; protein identification; small ORFs; small proteins.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Domain number distribution of small proteins in NCBI genpept. SP usually contains a single domain. The NCBI genpept database contains 14,324,397 proteins, including 1,796,324 (12.54%) SPs. Only 310,909 (17.31%) SPs, about 2.17% of total proteins, are annotated, and among the annotated domain SPs, most of them (85.26%) have only one domain.
Figure 2
Figure 2
An overview of integrated strategies for small proteins prediction. It is challenge to differentiate meaningful gene-coding sORFs from inutile sORFs because the shorter the protein sequence, the greater the probability of error rate of detection. First we suggest splitting the annotation of SPs from other proteins. Second, it is better to combine both in silico algorithms and evidence-based analysis. Then merge the two parts of results and get two sets of SPs as follows. The strict validated SPs are those validated by both methods, while other validated SPs are those only validated by either in silico algorithms or evidence-based analysis.

References

    1. Baker D. (2000). A surprising simplicity to protein folding. Nature 405, 39–42 10.1038/35011000 - DOI - PubMed
    1. Basrai M. A., Hieter P., Boeke J. D. (1997). Small open reading frames: beautiful needles in the haystack. Genome Res. 7, 768–771 10.1101/gr.7.8.768 - DOI - PubMed
    1. Basrai M. A., Velculescu V. E., Kinzler K. W., Hieter P. (1999). NORF5/HUG1 is a component of the MEC1-mediated checkpoint response to DNA damage and replication arrest in Saccharomyces cerevisiae. Mol. Cell. Biol. 19, 7041–7049 - PMC - PubMed
    1. Bienkowska J. R., Hartman H., Smith T. F. (2003). A search method for homologs of small proteins. Ubiquitin-like proteins in prokaryotic cells? Protein Eng. 16, 897–904 10.1093/protein/gzg130 - DOI - PubMed
    1. Blandin G., Durrens P., Tekaia F., Aigle M., Bolotin-Fukuhara M., Bon E., et al. (2000). Genomic exploration of the hemiascomycetous yeasts: 4. The genome of Saccharomyces cerevisiae revisited. FEBS Lett. 487, 31–36 10.1016/S0014-5793(00)02275-4 - DOI - PubMed