Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Sep 20;18(9):e0290890.
doi: 10.1371/journal.pone.0290890. eCollection 2023.

Expanding the repertoire of human tandem repeat RNA-binding proteins

Affiliations

Expanding the repertoire of human tandem repeat RNA-binding proteins

Agustín Ormazábal et al. PLoS One. .

Abstract

Protein regions consisting of arrays of tandem repeats are known to bind other molecular partners, including nucleic acid molecules. Although the interactions between repeat proteins and DNA are already widely explored, studies characterising tandem repeat RNA-binding proteins are lacking. We performed a large-scale analysis of human proteins devoted to expanding the knowledge about tandem repeat proteins experimentally reported as RNA-binding molecules. This work is timely because of the release of a full set of accurate structural models for the human proteome amenable to repeat detection using structural methods. The main goal of our analysis was to build a comprehensive set of human RNA-binding proteins that contain repeats at the sequence or structure level. Our results showed that the combination of sequence and structural methods finds significantly more tandem repeat proteins than either method alone. We identified 219 tandem repeat proteins that bind RNA molecules and characterised the overlap between repeat regions and RNA-binding regions as a first step towards assessing their functional relationship. We observed differences in the characteristics of repeat regions predicted by sequence-based or structure-based methods in terms of their sequence composition, their functions and their protein domains.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. The Pumilio protein (brown) forming a complex with a nanos response element RNA (red) (PDB access code: 1M8W).
Fig 2
Fig 2. Diagram showing the pipeline for finding RNA-binding repeat proteins and the analysis performed on them.
Fig 3
Fig 3
A) Venn diagram showing the different regions identified for proteins in this study. trRBDpeps are the intersection of residues between the tandem repeats and the RBDpep regions. B) Venn diagram showing the distribution of RNA-binding tandem repeat proteins identified by sequence and structural definitions of repeats.
Fig 4
Fig 4. An example of an RNA-binding tandem repeat protein (UniProtKB: O00567) predicted as repetitive by both types of methods.
It shows the larger coverage of repeats defined by structural methods, and the small overlap with the regions predicted as repetitive by sequence-based methods. The region corresponding to an RBDpep is highlighted in red. The “Merge” row corresponds to the union of the tandem repeats predicted by different methods. In the row corresponding to the pLDDT score, the segments are coloured according to the following scheme: orange (pLDDT< 50); yellow (50 >pLDDT< 70); light-blue (70 >pLDDT< 90); and blue (pLDDT> 90).
Fig 5
Fig 5. Distribution of pLDDT scores across the 219 RNA-binding tandem repeat proteins, split by repeat prediction method and region (whole protein or trRBDpeps).
As in the AlphaFoldDB website, the orange bar corresponds to the amount of residues among the dataset with a pLDDT score <50; the yellow bar corresponds to a pLDDT score between 50 and 70; the light-blue bar corresponds to a pLDDT score between 70 and 90; and the blue bar corresponds to values >90. A) pLDDT score profile for the whole set of RNA-binding tandem repeat proteins predicted as repeated by sequence-based methods. B) pLDDT score profile for trRBDpeps of proteins predicted as repeated by sequence-based methods. C) pLDDT score profile for the whole set of RNA-binding tandem repeat proteins predicted as repeated by both types of methods. D) pLDDT score profile for trRBDpeps of proteins predicted as repeated by both sequence and structure-based methods. E) pLDDT score profile for the whole RNA-binding tandem repeat proteins subset predicted as repeated by structure-based methods. F) pLDDT score profile for trRBDpeps of proteins predicted as repeated by structure-based methods. G) Distribution of pLDDT scores across the whole human dataset (20,264 proteins).
Fig 6
Fig 6
A) Amino Acid composition in regions with low and high pLDDT scores among the 219 RNA-binding tandem repeat proteins. B) Amino acid composition in regions with low and high pLDDT scores among the whole human proteome dataset, as well as for the complete sequences. C) Amino acid enrichment in regions with low and high pLDDT scores among RNA-binding tandem repeat proteins with respect to the whole human dataset. All values above 1 indicate an amino acid enrichment in RNA-binding tandem repeat proteins compared to the human proteome dataset. The opposite holds true for values below 1.
Fig 7
Fig 7. An example of a protein (TAF15, UniProtKB: Q92804) showing how the glycine residues of the RGG motif tend to be positioned in low pLDDT regions (RGG row, highlighted in green).
Fig 8
Fig 8. Amino acid composition in RBDpeps and non-RBDpeps among RNA-binding tandem repeat proteins.
A) Amino Acid composition among the RNA-binding tandem repeat proteins predicted as repetitive by sequential methods. B) Amino Acid composition among the RNA-binding tandem repeat proteins predicted as repetitive by both types of methods. C) Amino Acid composition among the RNA-binding tandem repeat proteins predicted as repetitive by structural methods.
Fig 9
Fig 9. Amino acid composition in tandem repeats and non-repeated regions.
A) Amino Acid composition among the RNA-binding tandem repeat proteins predicted as repetitive by sequential methods. B) Amino Acid composition among the RNA-binding tandem repeat proteins predicted as repetitive by both types of methods. C) Amino Acid composition among the RNA-binding tandem repeat proteins predicted as repetitive by structural methods.
Fig 10
Fig 10. Circle representation of the GO terms identified in every subset studied in this work.
Every column corresponds to one of the GO terms present in at least 8 entries in the whole RNA-binding repeat protein dataset. Each row corresponds to one of the subsets studied in this work. The radius of a circle is proportional to the number of matches for a specific GO term in each subset. Then all circle radii are further normalised in each set by the total number of proteins in each set to make them comparable.
Fig 11
Fig 11. An example of a protein (SRSF2, UniProtKB: Q01130) previously described as RNA-binding but not yet annotated as repetitive.
A) Structural visualisation coloured according to the pLDDT score scheme: orange (pLDDT< 50); yellow (50 >pLDDT< 70); light-blue (70 >pLDDT< 90); and blue (pLDDT> 90). B) Dot plot representation of the protein sequence aligned against itself, showing that a consistent pattern of repetition overlaps with its disordered region (residues 117–191).
Fig 12
Fig 12. An example of a protein, FAM98A (UniProtKB: Q8NCA5), previously described as RNA-binding but not yet annotated as repetitive.
The green representation highlights the regions predicted as tandem repeats by the methods mentioned above each structure. CE-Symm detects a symmetrical arrangement of alpha helices within the residues 1 to 137; RepeatsDB Lite detects, according to its own classification, a fibrous repeat helix among the residues 198 to 233; TRDistiller predicts a repeat in three discontinuous disordered stretches from residues 334 to 474, close to the C-terminal region.
Fig 13
Fig 13. An example of a protein (GTPBP4, UniProtKB: Q9BZE4) previously described as RNA-binding but not yet annotated as repetitive.
The structural visualisation is coloured according to the pLDDT score scheme: orange (pLDDT< 50); yellow (50 >pLDDT< 70); light-blue (70 >pLDDT< 90); and blue (pLDDT> 90). Both CE-Symm and RepeatsDB-Lite detect a tandem repeat close to the N terminal region (left side of the representation), forming a TIM-barrel motif according to the RepeatsDB classification.
Fig 14
Fig 14. An example of a protein (LRPPRC, UniProtKB: Q9BZE4) forming a repetitive structural pattern that is only detected by sequence-based methods.
A) Structural visualisation coloured according to the pLDDT score scheme: orange (pLDDT< 50); yellow (50 >pLDDT< 70); light-blue (70 >pLDDT< 90); and blue (pLDDT> 90). B) Van der Waals surface representation, where the negatively charged residues are highlighted in red, while those with a positive charge are highlighted in blue.

References

    1. Tandem-repeat proteins: regularity plus modularity equals design-ability. Curr. Opin. Struct. Biol. 23, 622–631 (2013). doi: 10.1016/j.sbi.2013.06.011 - DOI - PubMed
    1. Lobley A., Swindells M. B., Orengo C. A. & Jones D. T. Inferring function using patterns of native disorder in proteins. PLoS Comput. Biol. 3, e162 (2007). doi: 10.1371/journal.pcbi.0030162 - DOI - PMC - PubMed
    1. Delucchi M., Schaper E., Sachenkova O., Elofsson A. & Anisimova M. A New Census of Protein Tandem Repeats and Their Relationship with Intrinsic Disorder. Genes 11, (2020). doi: 10.3390/genes11040407 - DOI - PMC - PubMed
    1. Opperman L., Hook B., DeFino M., Bernstein D. S. & Wickens M. A single spacer nucleotide determines the specificities of two mRNA regulatory proteins. Nat. Struct. Mol. Biol. 12, 945–951 (2005). doi: 10.1038/nsmb1010 - DOI - PubMed
    1. Narita R. et al. A novel function of human Pumilio proteins in cytoplasmic sensing of viral infection. PLoS Pathog. 10, e1004417 (2014). doi: 10.1371/journal.ppat.1004417 - DOI - PMC - PubMed

Publication types