Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Sep 15;32(18):2776-82.
doi: 10.1093/bioinformatics/btw319. Epub 2016 Jun 9.

Revealing aperiodic aspects of solenoid proteins from sequence information

Affiliations

Revealing aperiodic aspects of solenoid proteins from sequence information

Thomas Hrabe et al. Bioinformatics. .

Abstract

Motivation: Repeat proteins, which contain multiple repeats of short sequence motifs, form a large but seldom-studied group of proteins. Methods focusing on the analysis of 3D structures of such proteins identified many subtle effects in length distribution of individual motifs that are important for their functions. However, similar analysis was yet not applied to the vast majority of repeat proteins with unknown 3D structures, mostly because of the extreme diversity of the underlying motifs and the resulting difficulty to detect those.

Results: We developed FAIT, a sequence-based algorithm for the precise assignment of individual repeats in repeat proteins and introduced a framework to classify and compare aperiodicity patterns for large protein families. FAIT extracts repeat positions by post-processing FFAS alignment matrices with image processing methods. On examples of proteins with Leucine Rich Repeat (LRR) domains and other solenoids like proteins, we show that the automated analysis with FAIT correctly identifies exact lengths of individual repeats based entirely on sequence information.

Availability and implementation: https://github.com/GodzikLab/FAIT CONTACT: adam@godziklab.org

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Several LRR structures sorted by their respective profile area A score. Ideally periodic internalin structures (1H6T A = 0, 2OMZ A = 0) have a low A score. Ribonuclease inhibitors (1A4Y A = 0.5) show a larger degree of aperiodicity with their λifluctuating between 28 aa and 29 aa. Structures with higher aperiodicity such as the TLR4 (2Z6A A = 1.5) or TLR5 (3J0A A = 2.1) have larger A scores
Fig. 2.
Fig. 2.
Processing the profile–profile scoring matrix from its original state (1) to the final signal (5) where LRR units can be detected. The respective steps (1–5) are described in more detail in the main text. (5) Peaks in the final signal are indicators for the starting positions of the LRR units in the query sequence. The highest peak in the profile identifies positions of two engineered residues in the 2OMZ sequence
Fig. 3.
Fig. 3.
Examples of aperiodicity profiles detected for solenoid structures with FAIT (blue curve) or ConSole (green curve). X-axis is the solenoid unit index (starting at 0) and Y-axis is the difference λuiλμ measured in amino acids (aa), solenoid units were colored as detected by FAIT. LRR box: (a) The aperiodicity profiles of the ribonuclease inhibitor (1A4Y-A) match with a profile similarity of 0.06, and both show the characteristic, sawtooth-like pattern. (b) Profiles of the mouse Toll-like receptor 4 (2Z64-A) shows a high aperiodicity, and both indicate aperiodicity at the same LRR units. Profile similarity equals 0.17. (c) Mouse Nod-like receptor 4 (4KXF-B) structure with its N-terminal LRR domain. Profile similarity equals 0.39. (d) Structure of bacterial LRR human gut symbionts with unknown function (4F0D-A). Unusually for LRRs, the structure is not curved and haves rather linear LRR domains with varying LRR unit lengths. The similarity between FAIT and ConSole profiles are 0.32. Ankyrin box: (e) 4UUC-A structure with a profile similarity of 0.33. Armadillo box: (f) 3TJ3-A structure with a profile similarity of 0.54. Highlighted repeat units in sequence of all six structures are presented in the Supplementary Material
Fig. 4.
Fig. 4.
Comparison of structure aperiodicities detected by ConSole (Str.) and FAIT (Seq.). The distribution of λμ and A score determines the position of each subfamily cluster in the plot. The cluster centers are determined by the mean values of λμ and A score and the cluster width by corresponding standard deviations in the cluster. Structures of the bacterial internalin family and the ribonuclease inhibitor family form clusters in the lower A-score regions, with the ribonuclease inhibitor cluster overlapping the NLR subfamily cluster (not shown). TLR structures generally have a higher aperiodicity (they form clusters in higher A-score regions). LRR structures from human gut show the largest aperiodicity of all subfamilies. The positions and widths of FAIT-based clusters are highly correlated with structure-based results

Similar articles

Cited by

References

    1. Andrade M.A. et al. (2001) Protein repeats: structures, functions, and evolution. J. Struct. Biol., 134, 117–131. - PubMed
    1. Anwar M.A. et al. (2015) Insights into the species-specific TLR4 signaling mechanism in response to Rhodobacter sphaeroides lipid A detection. Sci. Rep., 5, 7657. - PMC - PubMed
    1. Bazan J.F., Kajava A.V. (2015) Designs on a curve. Nat. Publ. Gr, 22, 103–105. - PubMed
    1. Biegert A., Söding J. (2008) De novo identification of highly diverged protein repeats by probabilistic consistency. Bioinformatics, 24, 807–814. - PubMed
    1. Di Domenico T. et al. (2013) RepeatsDB: a database of tandem repeat protein structures. Nucleic Acids Res, 42, D352–D357. - PMC - PubMed