. 2006 Jul;141(3):825-39.

doi: 10.1104/pp.106.077826.

Formation of the Arabidopsis pentatricopeptide repeat family

Eric Rivals¹, Clémence Bruyère, Claire Toffano-Nioche, Alain Lecharny

Affiliations

Affiliation

¹ Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5506, Université de Montpellier II, 34392 Montpellier cedex 5, France.

PMID: 16825340
PMCID: PMC1489915
DOI: 10.1104/pp.106.077826

Formation of the Arabidopsis pentatricopeptide repeat family

Eric Rivals et al. Plant Physiol. 2006 Jul.

. 2006 Jul;141(3):825-39.

doi: 10.1104/pp.106.077826.

Authors

Eric Rivals¹, Clémence Bruyère, Claire Toffano-Nioche, Alain Lecharny

Affiliation

¹ Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5506, Université de Montpellier II, 34392 Montpellier cedex 5, France.

PMID: 16825340
PMCID: PMC1489915
DOI: 10.1104/pp.106.077826

Abstract

In Arabidopsis (Arabidopsis thaliana) the 466 pentatricopeptide repeat (PPR) proteins are putative RNA-binding proteins with essential roles in organelles. Roughly half of the PPR proteins form the plant combinatorial and modular protein (PCMP) subfamily, which is land-plant specific. PCMPs exhibit a large and variable tandem repeat of a standard pattern of three PPR variant motifs. The association or not of this repeat with three non-PPR motifs at their C terminus defines four distinct classes of PCMPs. The highly structured arrangement of these motifs and the similar repartition of these arrangements in the four classes suggest precise relationships between motif organization and substrate specificity. This study is an attempt to reconstruct an evolutionary scenario of the PCMP family. We developed an innovative approach based on comparisons of the proteins at two levels: namely the succession of motifs along the protein and the amino acid sequence of the motifs. It enabled us to infer evolutionary relationships between proteins as well as between the inter- and intraprotein repeats. First, we observed a polarized elongation of the repeat from the C terminus toward the N-terminal region, suggesting local recombinations of motifs. Second, the most N-terminal PPR triple motif proved to evolve under different constraints than the remaining repeat. Altogether, the evidence indicates different evolution for the PPR region and the C-terminal one in PCMPs, which points to distinct functions for these regions. Moreover, local sequence homogeneity observed across PCMP classes may be due to interclass shuffling of motifs, or to deletions/insertions of non-PPR motifs at the C terminus.

PubMed Disclaimer

Figures

**Figure 1.**
Protein motifs and motif block organization in PPRPs. The gene models and protein structures, redrawn from FLAGdb++ (http://urgv.evry.inra.fr/FLAGdb), are shown for a representative of each subfamily, a PPRP in A and a PCMP in B. The black arrows at the bottom of the two figures represent the TIGR (http://www.tigr.org/) gene model and it is associated with the PPR repeat tagged by the PFAM motif PF01535 (Bateman et al., 2004). The manual annotations of PPR CDS and the associated organization of the PPR motifs come from the Small's Laboratory (URGV, Evry, France). In B, the organization of the PPR motifs from novel manual expertise has been obtained in the course of this work and the motif organization of the 198 PCMPs of Arabidopsis is given in Supplemental Table VII. Differences between annotations are underlined. Note that P motifs are more similar to the PF01535 motif than L and S. Two kinds of tandem repeats are shown: (1) the PPR motif repeat, i.e. a tandem repeat of P motifs in PPRPs, and (2) the PCMP block repeat highlighted by black borders in B, top line. PCMP blocks are either PLS, LSP, SPL, or PL²S (=P²L²S²) blocks.

**Figure 2.**
An evolutionary tree of the PCMP family based on the nr-PCMP set of proteins (branches scale: 20 units per cm). It is the best tree inferred from the matrix of distances between block sequences of the proteins according to the treeness criteria defined by (Guénoche and Garreta, 2000). The treeness criteria VAF equals 0.99 for the whole tree (see Supplemental Fig. 1 for comparison with trees obtained from different alignment parameters). The schematic representation appearing in the lower part of the Figure displays only the innermost branches that separate the PCMP classes A, E, F, and H, as well as the confidence values of those branches. Clearly, classes A and H are monophyletic (Re value of 1), while classes F and E are split in two and three subtrees. Indeed, the subtree of class H branches out between the two subtrees of class F, and class A subtree separates two subtrees of class E (group a). The split AE|FH is also supported by a maximal confidence value (Re of 1). In the complete tree, some branches are compressed to fit in the page. One observes that the subtrees corresponding to each class are organized similarly: they distinguish the groups as defined in Table I. The AGI-ID is followed, first by a capital letter indicating the PCMP-class, then by a lowercase letter for the PCMP group, and then by a figure giving the number of PCMP genes coding for proteins with the same block sequence.

**Figure 3.**
Sequence similarity between PLS or LSP² in the sequence database containing all the PCMPs. HMMs have been built up with 20 amino acid block sequences from either position-1 PLS (A), position-1 LSP (= LSP²; B), or position-2 PLS (C). The PLS block at position 1 is the first PLS block on the N-terminal side of the P²L²S² (or PL²S) block in the proteins (Table I; Fig. 1). The PLS block at position 2 is the next one toward the N terminus of the protein and so on for position 3 and others. Hmmsearch output, sorted by increasing E-value, has been organized in classes of 20 PCMP blocks as shown for A in the insert at the top right corner. E-value classes, illustrated by a bar, are ordered along the abscissa by increasing E-values. The E-value class of rank 1 (E-value class 1), contains the 20 PCMP blocks showing the highest similarity with the HMM and the similarity decreases with increased E-value class ranks. For other sequence comparisons, in B and C, only the highest and lowest E-values are given. Different patterns in bars indicate the numbers of PCMP blocks that are located at different protein positions: black for blocks at position 1, white for blocks at position 2, dark gray for blocks at position 3, and light gray for those at other positions. The number of blocks at the bottom of each bar (or E-value classes) is always for the blocks belonging to the same category as the 20 blocks used to build up the HMM. A regression line has been calculated for the number of these blocks. The line has been forced to horizontal when the slope was not significant. The stronger the slope the higher is the similarity of blocks belonging to the same category as the 20 blocks used for building the HMM (A) or the distance with blocks of other categories (B).

**Figure 4.**
Sequence similarity between position-1 PLS blocks from the three different classes of PCMP: H, F, and E. HMMs have been built up with amino acid sequences of 20 blocks from position 1 in one of the three different classes of PCMPs and used to search for similarity in the sequence database containing all the PCMPs. The protocol is as in Figure 3 but the HMMs were built up with sequences of PCMP blocks either from PCMP class H (A), class F (B), or class E (C). For more details on the representation of the results, see the legend of Figure 3.

**Figure 5.**
Sequence similarity between P²L²S² blocks of PCMPs. HMMs have been built up with 20 sequences of P²L²S² blocks from either PCMP class H (A), class F (B), or class E and used to search for similarity in the sequence database containing only the P²L²S² blocks from all the PCMPs. For more details on the representation of the results, see the legend of Figure 3.

**Figure 6.**
A scenario for the advent of the PCMP family. An ancestral protein containing both a PLS and a P²L²S² block (eventually belonging to class A) is fused with another protein containing the EE⁺Dyw block of non-PPR motifs. Then, the other classes appear each by subsequent losses of a C-terminal motif. In class H (ending with a Dyw motif), F (ending with a E⁺ motif), and E (ending with a E motif) the number of proteins increases by gene duplication, and the tandem array of PLS blocks in each protein varies in length through events of tandem amplification or contraction.

See this image and copyright information in PMC

Cited by

GhYGL1d, a pentatricopeptide repeat protein, is required for chloroplast development in cotton.
He P, Wu S, Jiang Y, Zhang L, Tang M, Xiao G, Yu J. He P, et al. BMC Plant Biol. 2019 Aug 13;19(1):350. doi: 10.1186/s12870-019-1945-1. BMC Plant Biol. 2019. PMID: 31409298 Free PMC article.
Seedling Lethal1, a pentatricopeptide repeat protein lacking an E/E+ or DYW domain in Arabidopsis, is involved in plastid gene expression and early chloroplast development.
Pyo YJ, Kwon KC, Kim A, Cho MH. Pyo YJ, et al. Plant Physiol. 2013 Dec;163(4):1844-58. doi: 10.1104/pp.113.227199. Epub 2013 Oct 21. Plant Physiol. 2013. PMID: 24144791 Free PMC article.
Chloroplastic pentatricopeptide repeat proteins (PPR) in albino plantlets of Agave angustifolia Haw. reveal unexpected behavior.
Andrade-Marcial M, Pacheco-Arjona R, Góngora-Castillo E, De-la-Peña C. Andrade-Marcial M, et al. BMC Plant Biol. 2022 Jul 19;22(1):352. doi: 10.1186/s12870-022-03742-2. BMC Plant Biol. 2022. PMID: 35850575 Free PMC article.
Sequence-specific binding of a chloroplast pentatricopeptide repeat protein to its native group II intron ligand.
Williams-Carrier R, Kroeger T, Barkan A. Williams-Carrier R, et al. RNA. 2008 Sep;14(9):1930-41. doi: 10.1261/rna.1077708. Epub 2008 Jul 30. RNA. 2008. PMID: 18669444 Free PMC article.
LPA66 is required for editing psbF chloroplast transcripts in Arabidopsis.
Cai W, Ji D, Peng L, Guo J, Ma J, Zou M, Lu C, Zhang L. Cai W, et al. Plant Physiol. 2009 Jul;150(3):1260-71. doi: 10.1104/pp.109.136812. Epub 2009 May 15. Plant Physiol. 2009. PMID: 19448041 Free PMC article.

See all "Cited by" articles

References

1. Akagi H, Nakamura A, Yokozeki-Misono Y, Inagaki A, Takahashi H, Mori K, Fujimura T (2004) Positional cloning of the rice Rf-1 gene, a restorer of BT-type cytoplasmic male sterility that encodes a mitochondria-targeting PPR protein. Theor Appl Genet 108: 1449–1457 - PubMed
1. Aubourg S, Boudet N, Kreis M, Lecharny A (2000) In Arabidopsis thaliana, 1% of the genome codes for a novel protein family unique to plants. Plant Mol Biol 42: 603–613 - PubMed
1. Aubourg S, Brunaud V, Bruyere C, Cock M, Cooke R, Cottet A, Couloux A, Dehais P, Deleage G, Duclert A, et al (2005) GeneFarm, structural and functional annotation of Arabidopsis gene and protein families by a network of experts. Nucleic Acids Res 33: D641–D646 - PMC - PubMed
1. Bahr A, Thompson JD, Thierry JC, Poch O (2001) BAliBASE (benchmark alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutations. Nucleic Acids Res 29: 323–326 - PMC - PubMed
1. Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, et al (2004) The Pfam protein families database. Nucleic Acids Res 32: D138–D141 - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
- PubMed Central
- Silverchair Information Systems
Molecular Biology Databases
- The Arabidopsis Information Resource

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Formation of the Arabidopsis pentatricopeptide repeat family

Affiliation

Formation of the Arabidopsis pentatricopeptide repeat family

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Molecular Biology Databases

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Molecular Biology Databases