Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Dec 12:9:599.
doi: 10.1186/1471-2164-9-599.

LRRCE: a leucine-rich repeat cysteine capping motif unique to the chordate lineage

Affiliations

LRRCE: a leucine-rich repeat cysteine capping motif unique to the chordate lineage

Hosil Park et al. BMC Genomics. .

Abstract

Background: The small leucine-rich repeat proteins and proteoglycans (SLRPs) form an important family of regulatory molecules that participate in many essential functions. They typically control the correct assembly of collagen fibrils, regulate mineral deposition in bone, and modulate the activity of potent cellular growth factors through many signalling cascades. SLRPs belong to the group of extracellular leucine-rich repeat proteins that are flanked at both ends by disulphide-bonded caps that protect the hydrophobic core of the terminal repeats. A capping motif specific to SLRPs has been recently described in the crystal structures of the core proteins of decorin and biglycan. This motif, designated as LRRCE, differs in both sequence and structure from other, more widespread leucine-rich capping motifs. To investigate if the LRRCE motif is a common structural feature found in other leucine-rich repeat proteins, we have defined characteristic sequence patterns and used them in genome-wide searches.

Results: The LRRCE motif is a structural element exclusive to the main group of SLRPs. It appears to have evolved during early chordate evolution and is not found in protein sequences from non-chordate genomes. Our search has expanded the family of SLRPs to include new predicted protein sequences, mainly in fishes but with intriguing putative orthologs in mammals. The chromosomal locations of the newly predicted SLRP genes would support the large-scale genome or gene duplications that are thought to have occurred during vertebrate evolution. From this expanded list we describe a new class of SLRP sequences that could be representative of an ancestral SLRP gene.

Conclusion: Given its exclusivity the LRRCE motif is a useful annotation tool for the identification and classification of new SLRP sequences in genome databases. The expanded list of members of the SLRP family offers interesting insights into early vertebrate evolution and suggests an early chordate evolutionary origin for the LRRCE capping motif.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Ribbon diagrams of different cysteine-capping motifs in LRR structures, viewed from the convex side of the LRR domains: (a) the LRRNT capping motif in the crystal structure of bovine decorin [29], PDB code 1XKU; (b) the LRRCT capping motif in the crystal structure of the Nogo receptor ectodomain [46], PDB code 1OZN; (c) the LRRCE capping motif in the crystal structure of bovine decorin [29]. The different secondary structure elements are identified as follows: green arrows, β-strands; red ribbons, α-helices; orange ribbons, 310 helices and β-turns; pink tubes, short polyproline II segments; yellow sticks, disulphide bonds. The N- and C-terminal ends in each panel are indicated. (Reproduced from [5] with permission from Birkhäuser Verlag AG)
Figure 2
Figure 2
Mapping of the regular expression pattern of the LRRCE motif on a skeletal representation of the LRRCE structure from bovine decorin [29]. The motif includes the laterally extended ear repeat, shown as a Cα trace in blue, the following LRR (in green Cα trace), and the final β-strand closing the domain (red Cα trace). The regular expression pattern used in this study, written in PROSITE syntax [79], was: [LIV]-X(2)-[LVIYFMA]-X-[LIFM]-X(2)-[NH]-X-[ILVF]-X(2)-[VIMFLY]-X(4)-[FIMLV]-C-X(7,20)-[LYIMV]-X(2)-[ILVTMF]-X-[LVMI]-X(2)-N-X-[IVLMAFT]-X(8,9)-[FYMPVAIS]-X-C. In PROSITE syntax each conserved position is shown either as a single amino acid (e.g. C, N) or all possible amino acids for that position enclosed within brackets (e.g. [ILVF] indicates that such position is occupied by Ile, Leu, Val or Phe); each variable position is shown with a letter X. Numbers in parentheses indicate stretches of variable positions (e.g. X(7,20) indicates a stretch of between 7 and 20 variable amino acids). Amino acid preferences for each position are shown in two boxes in "weblogo" form [87]. The conserved sequence positions for the ear repeat on the LRRCE motif are designated as P1, P4, P6,..., P20, and those for the following LRR as Q1, Q4, Q6,..., Q23. The side chains show the amino acids occurring at these conserved positons in the structure of bovine decorin.
Figure 3
Figure 3
Multiple sequence alignment of LRRCE motifs from a selected set of SLRP sequences from the UniProt non-redundant set. Names for the sequences are those of their corresponding Swiss-Prot or TrEMBL entries. Members of the different classes are shown with their names in green (class I), blue (class II), red (class III) or black (ECM2 and similar proteins). Two sequences from early SLRPs in urochordates (Ciona intestinalis and Ciona savigny) are also included with their name in magenta (see text). The boxes on the top indicate the two consecutive repeats LRR1 and LRR2 that contain the LRRCE motif. The ear itself is included in the first repeat. Residue conservation colour scheme: conserved cysteines in red; conserved residues in yellow; partially conserved residues in green; conserved prolines in cyan; polar residues in conserved hydrophobic sites in magenta; potential sites of N-linked glycosylation in blue.
Figure 4
Figure 4
Unrooted phylogenetic tree of an expanded set of LRRCE-containing sequences, including those from UniProt, NCBI and ENSEMBL databases. Sequences group themselves in four main SLRP classes, and the class II branch has been split into two subclasses IIa and IIb. See text for the abbreviations describing each SLRP type. The positions of several sequences specifically discussed in the text are indicated with bold-type numerals: 1, SLRP1 sequences from Ciona intestinalis and Ciona savignyi; 2, biglycan-like (BGL) and decorin-like (DCL) sequences from sea lamprey Petromyzon marinus; 3, keratocan-like (KERAL) sequence from lamprey; 4, epiphycan-like (EPYL) sequence from lamprey; 5, cluster of second copies of fibromodulin (FMOD2) exclusive to fish genomes; 6, cluster of second copies of lumican (LUM2) exclusive to fish genomes; 7, cluster of second copies of osteoglycin (OGN2) exclusive to fish genomes. This tree was calculated based on the sequence alignment of the LRRCE motifs. A larger version of this figure, with legible sequence names at the end of the phylogenetic tree branches, is provided as Additional File 2.
Figure 5
Figure 5
Two different gene assembly models for the only LRRCE-containing sequence in Ciona intestinalis. Sequences encoded by separate exons are shown in different colours (red-black-blue) for clarity. The long model assembly (left) contains 8 exons and 15 LRRs in its LRR domain. The same gene assembly model is used for the homologous protein in Ciona savigny. The short model assembly on the right contains 7 exons and 12 LRRs in its LRR domain; one and a half exons are skipped resulting in the removal of the underlined amino acids from the long form. Both models were generated using prediction algorithms. The short model was part of the first draft for the Ciona intestinalis genome [55] (JGI assembly version 1.0, ci148160), but was later withdrawn in JGI version 2.0 in favour of the longer model. Available EST data (see gene and transcript entries ENSCING00000012194, ENSCINT00000023142 in the ENSEMBL database), has confirmed the long assembly model with 15 LRRs.
Figure 6
Figure 6
Phylogenetic relationships of class I SLRPs, inferred from the multiple sequence alignment of LRR domains from a reduced set of SLRP sequences (see Methods). The tree has been rooted using the BGL lamprey sequences as outgroup. A second BGN sequence (BGN2) has been identified in the zebrafish genome but not yet in other fishes. Clade proability values higher than 60% are indicated, bayesian estimates in bold-type, neighbor-joining in italics, and maximum-likelihood in roman type. Probability values for the fine structure in each clade are not shown for clarity. The scale bar represents amino acid substitutions per site.
Figure 7
Figure 7
Phylogenetic relationships of class IIa SLRPs, inferred from the multiple sequence alignment of LRR domains from a reduced set of SLRP sequences (see Methods). The tree has been rooted using the midpoint method. Sequences group into two main clusters corresponding to class IIa SLRPs: fibromodulins FMOD and FMOD2, and lumicans LUM and LUM2. The second copies FMOD2 and LUM2 are only present in genomes of ray-finned fishes. Clade probability values higher than 60% are indicated as in Figure 6. The scale bar represents amino acid substitutions per site.
Figure 8
Figure 8
Phylogenetic relationships of class IIb SLRPs, inferred from the multiple sequence alignment of LRR domains from a reduced set of SLRP sequences (see Methods). The tree has been rooted using the lamprey sequence as outgroup. Clade probability values higher than 60% are indicated as in Figure 6. The scale bar represents amino acid substitutions per site.
Figure 9
Figure 9
Phylogenetic relationships of class III SLRPs, inferred from the multiple sequence alignment of LRR domains from a reduced set of SLRP sequences (see Methods). The tree has been rooted using the predicted epiphycan-like (EPYL) sequence from lamprey as outgroup. Sequences cluster into three main groups corresponding to class III SLRPs: opticin, epiphycan, and osteoglycins OGN and OGN2. The second copy OGN2 is only present in genomes of ray-finned fishes, whereas the gene for OPTC appears to have largely disappeared from fish genomes, the only known example so far being that of zebrafish. Clade probability values higher than 60% are indicated as in Figure 6. The scale bar represents amino acid substitutions per site.
Figure 10
Figure 10
Phylogenetic relationships of class A SLRPs (ECM2 and ECM2-like sequences), inferred from the multiple sequence alignment of LRR domains from a reduced set of SLRP sequences (see Methods). The tree has been rooted using the SLRP1 sequences from the two Ciona species as outgroup. Sequences group into three main clusters: ECM2, ECMX (ECM2-like protein from the X chromosome), and ECMZ (ECM2-like predicted protein upstream of the DCN gene in fish genomes). Clade probability values higher than 60% are indicated as in Figure 6. The scale bar represents amino acid substitutions per site.
Figure 11
Figure 11
Synteny of the genes from canonical SLRPs in several vertebrate genomes. Chromosomal or group location is shown when available in the ENSEMBL database, otherwise scaffold information is provided. Members from the four classes are shown in different colours: yellow (class A), green (class I), red (class II) and blue (class III). Genes shown consecutively do not have any other currently known genes in between, whereas the OPTC, FMOD2 and OGN2 genes in zebrafish and stickleback are separated from the other SLRPs by non-SLRP genes.

References

    1. Kobe B, Deisenhofer J. Proteins with leucine-rich repeats. Curr Opin Struct Biol. 1995;5:409–416. doi: 10.1016/0959-440X(95)80105-7. - DOI - PubMed
    1. Kajava AV. Structural diversity of leucine-rich repeat proteins. J Mol Biol. 1998;277:519–527. doi: 10.1006/jmbi.1998.1643. - DOI - PubMed
    1. Kobe B, Kajava AV. The leucine-rich repeat as a protein recognition motif. Curr Opin Struct Biol. 2001;11:725–732. doi: 10.1016/S0959-440X(01)00266-4. - DOI - PubMed
    1. Enkhbayar P, Kamiya M, Osaki M, Matsumoto T, Matsushima N. Structural principles of leucine-rich repeat (LRR) proteins. Proteins. 2004;54:394–403. doi: 10.1002/prot.10605. - DOI - PubMed
    1. Bella J, Hindle KL, McEwan PA, Lovell SC. The leucine-rich repeat structure. Cell Mol Life Sci. 2008;65:2307–2333. doi: 10.1007/s00018-008-8019-0. - DOI - PMC - PubMed

Publication types

LinkOut - more resources