Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2004 Aug;16(8):2089-103.
doi: 10.1105/tpc.104.022236. Epub 2004 Jul 21.

Genome-wide analysis of Arabidopsis pentatricopeptide repeat proteins reveals their essential role in organelle biogenesis

Affiliations

Genome-wide analysis of Arabidopsis pentatricopeptide repeat proteins reveals their essential role in organelle biogenesis

Claire Lurin et al. Plant Cell. 2004 Aug.

Abstract

The complete sequence of the Arabidopsis thaliana genome revealed thousands of previously unsuspected genes, many of which cannot be ascribed even putative functions. One of the largest and most enigmatic gene families discovered in this way is characterized by tandem arrays of pentatricopeptide repeats (PPRs). We describe a detailed bioinformatic analysis of 441 members of the Arabidopsis PPR family plus genomic and genetic data on the expression (microarray data), localization (green fluorescent protein and red fluorescent protein fusions), and general function (insertion mutants and RNA binding assays) of many family members. The basic picture that arises from these studies is that PPR proteins play constitutive, often essential roles in mitochondria and chloroplasts, probably via binding to organellar transcripts. These results confirm, but massively extend, the very sparse observations previously obtained from detailed characterization of individual mutants in other organisms.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Alignment of the Four Most Abundant PPR-Related Motifs in Arabidopsis in Comparison with the PFAM PF01535 PPR Motif. Consensus sequences were obtained using the HMMER package based on alignments of thousands of Arabidopsis motifs. The PPR consensus used by PFAM (and obtained using motifs from a variety of organisms) is almost identical to our consensus except shifted by two amino acids, such that it overlaps the first helix of the following motif. Residues in capital letters are more highly conserved within each motif. Residues in bold are conserved between PPR-related motifs. The underlined sequences indicate the correspondences to the motifs A (underlined three times), B (underlined once), and C (underlined twice) described by Aubourg et al. (2000). Motif C overlaps adjacent S and P motifs. The shaded boxes indicate the maximum extent of the predicted α-helical regions.
Figure 2.
Figure 2.
HMMER-Derived Consensus Sequences of C-Terminal Motifs Present in PPR Proteins. The alignments employed for the E, E+, and DYW motifs contained 184, 148, and 85 sequences, respectively. The best conserved residues are in capital letters; bold, underlined amino acids are completely invariant. For the DYW motif, the DYW triplet (or a closely related sequence) forms the C terminus of the protein.
Figure 3.
Figure 3.
Motif Structure of Arabidopsis PPR Proteins. Typical structures of proteins from each of the principal subfamilies and subgroups are shown. The structures are purely indicative, and the number and even order of repeats can vary in individual proteins. The number of proteins falling into each subgroup is shown.
Figure 4.
Figure 4.
Predicted Subcellular Localization of PPR Proteins. The proportions of each PPR subclass predicted by Predotar to be targeted to mitochondria (black segments) or plastids (gray segments) or to lack targeting signals (white segments) are indicated. PPR and E proteins are mostly predicted to be mitochondrial; E+ and DYW proteins are predicted to be more evenly distributed between the two organelles.
Figure 5.
Figure 5.
Expression of Sets of Genes as Measured by Microarray Hybridization Data. RNA was extracted from Arabidopsis rosette leaves or flowers and hybridized to CATMA arrays containing 24,576 gene-specific probes. Data from four independent two-color hybridizations (comprising two dye swaps) were corrected and averaged as described in Methods. The scales are logarithmic (log2), representing the mean signal ratio (leaves/flowers) against the maximum mean signal intensity (leaves or flowers). The dotted lines indicate the ratios (0.46) above or below that which the statistical analysis indicates the genes to be differentially expressed in this set of experiments. (A) Expression of PPR genes. Data points corresponding to the PPR and P-L-S subfamilies are depicted in dark gray or light gray, respectively. In general, the PPR subfamily is more highly expressed (Wilcoxon rank sum test, P < 10−15). (B) Expression of genes predicted to encode plastid or mitochondrial proteins. Data points corresponding to genes encoding Predotar-predicted plastid or mitochondrial proteins (cutoff 0.5) are depicted in dark gray or light gray, respectively. The predicted plastid set shows a strong bias toward higher expression in leaves (Wilcoxon rank sum test, P < 10−15). cp, chloroplast; mt, mitochondria. (C) Expression of genes predicted to encode plastid or mitochondrial PPR proteins. Data points corresponding to genes encoding Predotar-predicted mitochondrial or plastid PPR proteins (cutoff 0.25) are depicted in dark gray or light gray, respectively. The two sets do not show significantly different distributions of leaf/flower expression ratios (Wilcoxon rank sum test, P > 0.75) and are much less biased toward expression in leaves than the complete predicted plastid set shown in (B) (Wilcoxon rank sum test, P < 10−15) while being slightly less biased toward expression in flowers than the complete mitochondrial set (Wilcoxon rank sum test, P < 0.04).
Figure 6.
Figure 6.
RNA Binding Assay for Four Representative Arabidopsis PPR Proteins. Two of these proteins (At1g79540 and At5g12100) are P subfamily members. At3g25970 is an E+ protein, and At5g13270 is a DYW protein. (A) Radioactively labeled protein retained on Sepharose columns carrying various polyribonuculeotides, single-stranded DNA (ssDNA), or double-stranded DNA (dsDNA). The left lane was loaded with one-tenth of the protein loaded on the columns. The far right lane shows binding to a Sepharose column lacking added nucleotides. mRBP2b is a previously characterized RNA binding protein of the RNA recognition motif family, used here as a positive control. The bottom panel shows binding of β-glucuronidase (GUS) as a negative control. (B) Competition assays for At5g12100. Labeled proteins were preincubated with competitor RNA or heparin before being loaded on a poly(G) Sepharose column and binding quantified by a phosphoimager, using binding in absence of competitor as 100%.
Figure 7.
Figure 7.
A Model for PPR Protein Action. We assume that the putative superhelix formed by tandemly repeated PPR motifs forms a sequence-specific RNA binding surface either alone (A) or in the presence of an additional factor (B). The resulting protein-RNA complex recruits one or more other transfactors to a specific site on the RNA target (in this case an endonuclease). We assume that in most cases the catalytic site is in the partner protein; for the DYW class of PPR proteins, it may lie in the C-terminal domain itself.
Figure 8.
Figure 8.
Order of Appearance and Likely Evolutionary Relationships between PPR Families Based on Phylogenetic Distribution. TPR proteins are ubiquitous, whereas PPR proteins are only found in eukaryotes and the P-L-S subfamily only in land plants.

References

    1. Akashi, K., Grandjean, O., and Small, I. (1998). Potential dual targeting of an Arabidopsis archaebacterial-like histidyl-tRNA synthetase to mitochondria and chloroplasts. FEBS Lett. 431, 39–44. - PubMed
    1. Alonso, J.M., et al. (2003). Genome-wide insertional mutagenesis of Arabidopsis thaliana. Science 301, 653–657. - PubMed
    1. Arabidopsis Genome Initiative. (2000). Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408, 796–815. - PubMed
    1. Aubourg, S., Boudet, N., Kreis, M., and Lecharny, A. (2000). In Arabidopsis thaliana, 1% of the genome codes for a novel protein family unique to plants. Plant Mol. Biol. 42, 603–613. - PubMed
    1. Barkan, A., and Goldschmidt-Clermont, M. (2000). Participation of nuclear genes in chloroplast gene expression. Biochimie 82, 559–572. - PubMed

MeSH terms