Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Jul;141(3):825-39.
doi: 10.1104/pp.106.077826.

Formation of the Arabidopsis pentatricopeptide repeat family

Affiliations

Formation of the Arabidopsis pentatricopeptide repeat family

Eric Rivals et al. Plant Physiol. 2006 Jul.

Abstract

In Arabidopsis (Arabidopsis thaliana) the 466 pentatricopeptide repeat (PPR) proteins are putative RNA-binding proteins with essential roles in organelles. Roughly half of the PPR proteins form the plant combinatorial and modular protein (PCMP) subfamily, which is land-plant specific. PCMPs exhibit a large and variable tandem repeat of a standard pattern of three PPR variant motifs. The association or not of this repeat with three non-PPR motifs at their C terminus defines four distinct classes of PCMPs. The highly structured arrangement of these motifs and the similar repartition of these arrangements in the four classes suggest precise relationships between motif organization and substrate specificity. This study is an attempt to reconstruct an evolutionary scenario of the PCMP family. We developed an innovative approach based on comparisons of the proteins at two levels: namely the succession of motifs along the protein and the amino acid sequence of the motifs. It enabled us to infer evolutionary relationships between proteins as well as between the inter- and intraprotein repeats. First, we observed a polarized elongation of the repeat from the C terminus toward the N-terminal region, suggesting local recombinations of motifs. Second, the most N-terminal PPR triple motif proved to evolve under different constraints than the remaining repeat. Altogether, the evidence indicates different evolution for the PPR region and the C-terminal one in PCMPs, which points to distinct functions for these regions. Moreover, local sequence homogeneity observed across PCMP classes may be due to interclass shuffling of motifs, or to deletions/insertions of non-PPR motifs at the C terminus.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Protein motifs and motif block organization in PPRPs. The gene models and protein structures, redrawn from FLAGdb++ (http://urgv.evry.inra.fr/FLAGdb), are shown for a representative of each subfamily, a PPRP in A and a PCMP in B. The black arrows at the bottom of the two figures represent the TIGR (http://www.tigr.org/) gene model and it is associated with the PPR repeat tagged by the PFAM motif PF01535 (Bateman et al., 2004). The manual annotations of PPR CDS and the associated organization of the PPR motifs come from the Small's Laboratory (URGV, Evry, France). In B, the organization of the PPR motifs from novel manual expertise has been obtained in the course of this work and the motif organization of the 198 PCMPs of Arabidopsis is given in Supplemental Table VII. Differences between annotations are underlined. Note that P motifs are more similar to the PF01535 motif than L and S. Two kinds of tandem repeats are shown: (1) the PPR motif repeat, i.e. a tandem repeat of P motifs in PPRPs, and (2) the PCMP block repeat highlighted by black borders in B, top line. PCMP blocks are either PLS, LSP, SPL, or PL2S (=P2L2S2) blocks.
Figure 2.
Figure 2.
An evolutionary tree of the PCMP family based on the nr-PCMP set of proteins (branches scale: 20 units per cm). It is the best tree inferred from the matrix of distances between block sequences of the proteins according to the treeness criteria defined by (Guénoche and Garreta, 2000). The treeness criteria VAF equals 0.99 for the whole tree (see Supplemental Fig. 1 for comparison with trees obtained from different alignment parameters). The schematic representation appearing in the lower part of the Figure displays only the innermost branches that separate the PCMP classes A, E, F, and H, as well as the confidence values of those branches. Clearly, classes A and H are monophyletic (Re value of 1), while classes F and E are split in two and three subtrees. Indeed, the subtree of class H branches out between the two subtrees of class F, and class A subtree separates two subtrees of class E (group a). The split AE|FH is also supported by a maximal confidence value (Re of 1). In the complete tree, some branches are compressed to fit in the page. One observes that the subtrees corresponding to each class are organized similarly: they distinguish the groups as defined in Table I. The AGI-ID is followed, first by a capital letter indicating the PCMP-class, then by a lowercase letter for the PCMP group, and then by a figure giving the number of PCMP genes coding for proteins with the same block sequence.
Figure 3.
Figure 3.
Sequence similarity between PLS or LSP2 in the sequence database containing all the PCMPs. HMMs have been built up with 20 amino acid block sequences from either position-1 PLS (A), position-1 LSP (= LSP2; B), or position-2 PLS (C). The PLS block at position 1 is the first PLS block on the N-terminal side of the P2L2S2 (or PL2S) block in the proteins (Table I; Fig. 1). The PLS block at position 2 is the next one toward the N terminus of the protein and so on for position 3 and others. Hmmsearch output, sorted by increasing E-value, has been organized in classes of 20 PCMP blocks as shown for A in the insert at the top right corner. E-value classes, illustrated by a bar, are ordered along the abscissa by increasing E-values. The E-value class of rank 1 (E-value class 1), contains the 20 PCMP blocks showing the highest similarity with the HMM and the similarity decreases with increased E-value class ranks. For other sequence comparisons, in B and C, only the highest and lowest E-values are given. Different patterns in bars indicate the numbers of PCMP blocks that are located at different protein positions: black for blocks at position 1, white for blocks at position 2, dark gray for blocks at position 3, and light gray for those at other positions. The number of blocks at the bottom of each bar (or E-value classes) is always for the blocks belonging to the same category as the 20 blocks used to build up the HMM. A regression line has been calculated for the number of these blocks. The line has been forced to horizontal when the slope was not significant. The stronger the slope the higher is the similarity of blocks belonging to the same category as the 20 blocks used for building the HMM (A) or the distance with blocks of other categories (B).
Figure 4.
Figure 4.
Sequence similarity between position-1 PLS blocks from the three different classes of PCMP: H, F, and E. HMMs have been built up with amino acid sequences of 20 blocks from position 1 in one of the three different classes of PCMPs and used to search for similarity in the sequence database containing all the PCMPs. The protocol is as in Figure 3 but the HMMs were built up with sequences of PCMP blocks either from PCMP class H (A), class F (B), or class E (C). For more details on the representation of the results, see the legend of Figure 3.
Figure 5.
Figure 5.
Sequence similarity between P2L2S2 blocks of PCMPs. HMMs have been built up with 20 sequences of P2L2S2 blocks from either PCMP class H (A), class F (B), or class E and used to search for similarity in the sequence database containing only the P2L2S2 blocks from all the PCMPs. For more details on the representation of the results, see the legend of Figure 3.
Figure 6.
Figure 6.
A scenario for the advent of the PCMP family. An ancestral protein containing both a PLS and a P2L2S2 block (eventually belonging to class A) is fused with another protein containing the EE+Dyw block of non-PPR motifs. Then, the other classes appear each by subsequent losses of a C-terminal motif. In class H (ending with a Dyw motif), F (ending with a E+ motif), and E (ending with a E motif) the number of proteins increases by gene duplication, and the tandem array of PLS blocks in each protein varies in length through events of tandem amplification or contraction.

Similar articles

Cited by

References

    1. Akagi H, Nakamura A, Yokozeki-Misono Y, Inagaki A, Takahashi H, Mori K, Fujimura T (2004) Positional cloning of the rice Rf-1 gene, a restorer of BT-type cytoplasmic male sterility that encodes a mitochondria-targeting PPR protein. Theor Appl Genet 108: 1449–1457 - PubMed
    1. Aubourg S, Boudet N, Kreis M, Lecharny A (2000) In Arabidopsis thaliana, 1% of the genome codes for a novel protein family unique to plants. Plant Mol Biol 42: 603–613 - PubMed
    1. Aubourg S, Brunaud V, Bruyere C, Cock M, Cooke R, Cottet A, Couloux A, Dehais P, Deleage G, Duclert A, et al (2005) GeneFarm, structural and functional annotation of Arabidopsis gene and protein families by a network of experts. Nucleic Acids Res 33: D641–D646 - PMC - PubMed
    1. Bahr A, Thompson JD, Thierry JC, Poch O (2001) BAliBASE (benchmark alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutations. Nucleic Acids Res 29: 323–326 - PMC - PubMed
    1. Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, et al (2004) The Pfam protein families database. Nucleic Acids Res 32: D138–D141 - PMC - PubMed

Publication types

Substances

LinkOut - more resources