Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 May 8;10(5):e1003607.
doi: 10.1371/journal.pcbi.1003607. eCollection 2014 May.

Synonymous constraint elements show a tendency to encode intrinsically disordered protein segments

Affiliations

Synonymous constraint elements show a tendency to encode intrinsically disordered protein segments

Mauricio Macossay-Castillo et al. PLoS Comput Biol. .

Abstract

Synonymous constraint elements (SCEs) are protein-coding genomic regions with very low synonymous mutation rates believed to carry additional, overlapping functions. Thousands of such potentially multi-functional elements were recently discovered by analyzing the levels and patterns of evolutionary conservation in human coding exons. These elements provide a good opportunity to improve our understanding of how the redundant nature of the genetic code is exploited in the cell. Our premise is that the protein segments encoded by such elements might better comply with the increased functional demands if they are structurally less constrained (i.e. intrinsically disordered). To test this idea, we investigated the protein segments encoded by SCEs with computational tools to describe the underlying structural properties. In addition to SCEs, we examined the level of disorder, secondary structure, and sequence complexity of protein regions overlapping with experimentally validated splice regulatory sites. We show that multi-functional gene regions translate into protein segments that are significantly enriched in structural disorder and compositional bias, while they are depleted in secondary structure and domain annotations compared to reference segments of similar lengths. This tendency suggests that relaxed protein structural constraints provide an advantage when accommodating multiple overlapping functions in coding regions.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Possible overlapping functions fulfilled by synonymously constrained coding regions.
The primary function of a given coding DNA segment is indicated at the top, while at the bottom, different types of possible second functions that could be maintained by the same DNA segment, are summarized. These functions can be grouped into two major classes, depending on the need for an extra molecule fulfilling the extra task. These two are then further divided according to the molecule type involved.
Figure 2
Figure 2. Comparison of SCE-encoded protein segments with reference segments from four structural aspects.
Human SCE-encoded protein segments are compared to randomly selected protein segments of the SCE-containing proteins with same length distribution from four structural aspects. A segment is accepted to belong to a given structural property if at least 50% of its residues are positively assigned by the corresponding prediction method. Percent of segments assigned with A) structural disorder (IUPred), B) low sequence complexity (SEG), C) domain annotation (PfamScan) and D) secondary structure (PSIPRED) for SCE-encoded and reference segment datasets of all three detection resolutions. The numbers of segments for each property were compared between the SCE and reference datasets using Yates' chi-square test with the corresponding p-values indicated above the bars.
Figure 3
Figure 3. DNA-level secondary functions in coding regions: The case of the HOXA2 gene.
The homeobox protein Hox-A2 is represented by a light grey bar, with its sole domain (homeobox) and antp-type motif colored purple (residue boundaries assigned based on the UniProtKB annotation) and its SCE-overlapping N-terminal region marked by dark grey. The CDS corresponding to this segment is shown above the domain map in a light blue box with the region of multi-functionality (a HOX-PBX responsive element) highlighted in yellow. The corresponding peptide sequence is presented in a purple box with the precise locations of detected SCEs, predicted disordered regions, low sequence complexity segments and secondary structure elements (H – helix, E – extended) represented as dark blue bars below the protein sequence. B) The enhancer-rich region corresponding to residues 261–313 of the same Hox protein is presented in a similar fashion as in panel A.
Figure 4
Figure 4. The alternative translation start site within BRCA1 translates into a mostly disordered protein segment.
The CDS fragment corresponding to residues 275–310 in the canonical BRCA1 isoform is presented in a light blue box at the top, with a validated alternative translation start site (ATSS) highlighted in yellow. The domain map of the canonical isoform is shown below the CDS with the domains coloured purple (residue boundaries assigned based on the UniProtKB) and the region surrounding the mentioned ATSS marked by darker grey. The protein segment in question is enlarged from the domain map and the identified SCEs and predicted structural properties are indicated below by dark blue bars, as explained for Figure 3.
Figure 5
Figure 5. Validated splicing factor binding sites embedded in a dual-coding region.
The domain map of the canonical apoptosis-mediating surface antigen FAS is shown at the top, with domains marked by light purple and the only transmembrane region (TRM) marked by darker purple (residue boundaries assigned based on the UniProtKB). The boundaries of the region that overlaps the two splicing factor binding sites are provided, and the CDS corresponding to the given region is presented below the domain map. The two splicing factor binding sites are highlighted by yellow in the CDS with the names of the corresponding splicing factors indicated. The overlapping residues are similarly highlighted in the sequences of the two distinct protein isoforms. The predicted protein structural properties are indicated below as in Figure 3.
Figure 6
Figure 6. The SCEs of human CBP and p300 are differently distributed along their chains and avoid structured domains.
The human p300 (A) and CBP (B) are represented by grey bars at the top part of the panels, with their domains (boundaries adopted from a relevant review [19]) coloured purple and their regions corresponding to 15-codon resolution SCEs coloured with darker grey. Below the domain maps the predicted IUPred disorder patterns are shown in dark blue, where values above 0.5 are interpreted as disorder. The SCE-encoded regions are lettered from the N- towards the C-terminus in each protein and are reflected onto the prediction curves. Their structural properties are provided as in Figure 3.

References

    1. Ureta-Vidal A, Ettwiller L, Birney E (2003) Comparative genomics: genome-wide analysis in metazoan eukaryotes. Nat Rev Genet 4: 251–262. - PubMed
    1. Sidow A (2002) Sequence first. Ask questions later. Cell 111: 13–16. - PubMed
    1. Sumiyama K, Kim CB, Ruddle FH (2001) An efficient cis-element discovery method using multiple sequence comparisons based on evolutionary relationships. Genomics 71: 260–262. - PubMed
    1. Berezikov E, Guryev V, van de Belt J, Wienholds E, Plasterk RH, et al. (2005) Phylogenetic shadowing and computational identification of human microRNA genes. Cell 120: 21–24. - PubMed
    1. Lim LP, Glasner ME, Yekta S, Burge CB, Bartel DP (2003) Vertebrate microRNA genes. Science 299: 1540. - PubMed

Publication types

Substances

LinkOut - more resources