Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Dec;25(12):2164-2174.
doi: 10.1002/pro.3041. Epub 2016 Oct 25.

Large-scale analysis of intrinsic disorder flavors and associated functions in the protein sequence universe

Affiliations

Large-scale analysis of intrinsic disorder flavors and associated functions in the protein sequence universe

Marco Necci et al. Protein Sci. 2016 Dec.

Abstract

Intrinsic disorder (ID) in proteins has been extensively described for the last decade; a large-scale classification of ID in proteins is mostly missing. Here, we provide an extensive analysis of ID in the protein universe on the UniProt database derived from sequence-based predictions in MobiDB. Almost half the sequences contain an ID region of at least five residues. About 9% of proteins have a long ID region of over 20 residues which are more abundant in Eukaryotic organisms and most frequently cover less than 20% of the sequence. A small subset of about 67,000 (out of over 80 million) proteins is fully disordered and mostly found in Viruses. Most proteins have only one ID, with short ID evenly distributed along the sequence and long ID overrepresented in the center. The charged residue composition of Das and Pappu was used to classify ID proteins by structural propensities and corresponding functional enrichment. Swollen Coils seem to be used mainly as structural components and in biosynthesis in both Prokaryotes and Eukaryotes. In Bacteria, they are confined in the nucleoid and in Viruses provide DNA binding function. Coils & Hairpins seem to be specialized in ribosome binding and methylation activities. Globules & Tadpoles bind antigens in Eukaryotes but are involved in killing other organisms and cytolysis in Bacteria. The Undefined class is used by Bacteria to bind toxic substances and mediate transport and movement between and within organisms in Viruses. Fully disordered proteins behave similarly, but are enriched for glycine residues and extracellular structures.

Keywords: MobiDB; UniProt; classification; intrinsic disorder; protein sequence; protein structure.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Comparison between short (A, left) and long (B, right) IDRs. Each panel shows from top to bottom: Percentage (%) of disorder, length of disorder regions, and number of IDRs per protein. The % of ID is calculated for each protein as the fraction of disordered residues over the entire sequence. Numbers on the x‐axis represent the upper limit (included) of the bin, for example, 5 represents sequences with 0 to 5% of disorder. Notice that short IDRs are defined with at least five ID residues and become long IDRs above 20 residues.
Figure 2
Figure 2
Disorder region position in the sequence, calculated on the number of short (A) and long (B) IDRs. A region is considered N‐ or C‐terminal if it covers the first or last residues in the sequence. All other cases are assigned to the middle. The percentage is calculated over the total number of short (or long) IDRs considering only the longest region of each protein.
Figure 3
Figure 3
Disorder distribution for the domains of life, normalized by their relative abundance in UniProt. The distribution is shown for all sequences (left, inner crown), short IDRs (left, outer crown), long IDRs (right, inner crown), and fully ID proteins (right, outer crown). The IDR percentage (short, long, and full) is calculated by dividing the fraction of IDRs for a given domain of life in UniProt by the total number of IDRs in UniProt. Notice the increase from bacteria toward eukaryotes for increasing disorder content as well as the high abundance of viral proteins among fully ID proteins.
Figure 4
Figure 4
Classification of disorder flavors based on sequence features for long IDRs. The four main classes (Globules & Tadpoles, Coils & Hairpins, Undefined, and Swollen Coils) are shown with the number and fraction of matching long IDRs below each title. Each histogram further divides the class based on six additional sequence features (highly polar, highly positive, highly negative, low complexity, proline‐rich, and glycine‐rich).
Figure 5
Figure 5
Five most enriched GO‐terms for each class of long IDRs compared to all proteins with IDRs, for the Molecular Function (A), Cellular Component (B), and Biological Process (C) ontologies, respectively. Each bar is colored on the basis of the proportion of proteins belonging to the different domains of life. The x‐axis shows the logarithmic increase compared to the reference set (see Methods for details).
Figure 6
Figure 6
Five most enriched GO‐terms for fully disordered proteins compared to all proteins with IDRs, for the Molecular Function, Cellular Component, and Biological Process ontologies respectively. Due to the small sample size, no distinction is made between classes or domains of life. The x‐axis shows the logarithmic increase compared to the reference set (see Methods for details).

References

    1. Dunker AK, Obradovic Z, Romero P, Garner EC, Brown CJ (2000) Intrinsic protein disorder in complete genomes. Genome Inform Workshop Genome Inform 11:161–171. - PubMed
    1. Tompa P (2012) Intrinsically disordered proteins: a 10‐year recap. Trends Biochem Sci 37:509–516. - PubMed
    1. Wright PE, Dyson HJ (1999) Intrinsically unstructured proteins: re‐assessing the protein structure‐function paradigm. J Mol Biol 293:321–331. - PubMed
    1. Dunker AK, Brown CJ, Lawson JD, Iakoucheva LM, Obradović Z (2002) Intrinsic disorder and protein function. Biochemistry 41:6573–6582. - PubMed
    1. Dyson HJ, Wright PE (2002) Coupling of folding and binding for unstructured proteins. Curr Opin Struct Biol 12:54–60. - PubMed

MeSH terms

Substances

LinkOut - more resources