Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2015 Dec:35:50-6.
doi: 10.1016/j.gde.2015.08.010. Epub 2015 Nov 3.

The language of the protein universe

Affiliations
Review

The language of the protein universe

Andrea Scaiewicz et al. Curr Opin Genet Dev. 2015 Dec.

Abstract

Proteins, the main cell machinery which play a major role in nearly every cellular process, have always been a central focus in biology. We live in the post-genomic era, and inferring information from massive data sets is a steadily growing universal challenge. The increasing availability of fully sequenced genomes can be regarded as the 'Rosetta Stone' of the protein universe, allowing the understanding of genomes and their evolution, just as the original Rosetta Stone allowed Champollion to decipher the ancient Egyptian hieroglyphics. In this review, we consider aspects of the protein domain architectures repertoire that are closely related to those of human languages and aim to provide some insights about the language of proteins.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Analogy between human and proteins languages
In this comparison, the vocabulary (domains) of proteins is built from an alphabet of amino acids. The syntax principles enable domain association to form multi-domain architectures, a process governed by hierarchical rules (grammar), that determine the structure and hence the biological function (semantics) of proteins. In several languages, for example in English, a number of different classes of words exist (nouns, adjectives ,verbs, adverbs, pronouns, conjunctions). Each class has its task in the language, i.e. nouns name words, adjectives describe nouns, verbs are action words, conjunction connect words. Analogously, one can also distinguish different classes of domains with different tasks (motors, binding proteins, enzymes, signaling proteins, structural proteins, targeting proteins).
Figure 2
Figure 2. Sequence profile databases can be sequence-based (blue circles) or structure-based (orange circles)
Sequence-based profiles are derived by mainly two methods: HMMs (Hidden Markov Models) or PSSMs. (Position Sensitive Sequence Matrices). Structure-based profiles in Gene3D and superfamily are generated from HMMs built from actual structures coming from CATH and SCOP, respectively. Two main integrative resources, CDART and InterPro, are shown (green circles) with the databases they include.
Figure 3
Figure 3. Mechanisms of Domain Architecture Change
New domain architectures can be created by (A) insertion or loss of existing domains (duplication), (B) insertion or loss of novel domains or by (C) substitution of one domain for another (exchange), which is almost always a two-step process comprising loss / fission and fusion. Domain insertion and loss are a consequence of gene fusion and fission respectively. Domain insertions or losses can be (D) internal (between two domains, also called domain shuffling) or may occur either at the N or C terminus (A) & (B).

References

    1. Searls DB. The language of genes. Nature. 2002;420(211-217) - PubMed
    1. Gimona M. Protein linguistics - a grammar for modular protein assembly? Nat Rev Mol Cell Bio. 2006;7(1):68–73. - PubMed
    1. Eisenhaber F. A decade after the first full human genome sequencing: When will we understand our own genome? J Bioinf Comput Biol. 2012;10(5) - PubMed
    1. Chomsky N. Logical-structures in language. Am Doc. 1957;8(4):284–291.
    1. Chomsky N. Fundamentals of language - jakobson,r, halle,m. Int J Am Linguist. 1957;23(3):234–242.

Publication types

LinkOut - more resources