Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2002 Jan;14(1):211-36.
doi: 10.1105/tpc.010304.

Central functions of the lumenal and peripheral thylakoid proteome of Arabidopsis determined by experimentation and genome-wide prediction

Affiliations

Central functions of the lumenal and peripheral thylakoid proteome of Arabidopsis determined by experimentation and genome-wide prediction

Jean-Benoît Peltier et al. Plant Cell. 2002 Jan.

Abstract

Experimental proteome analysis was combined with a genome-wide prediction screen to characterize the protein content of the thylakoid lumen of Arabidopsis chloroplasts. Soluble thylakoid proteins were separated by two-dimensional electrophoresis and identified by mass spectrometry. The identities of 81 proteins were established, and N termini were sequenced to validate localization prediction. Gene annotation of the identified proteins was corrected by experimental data, and an interesting case of alternative splicing was discovered. Expression of a surprising number of paralogs was detected. Expression of five isomerases of different classes suggests strong (un)folding activity in the thylakoid lumen. These isomerases possibly are connected to a network of peripheral and lumenal proteins involved in antioxidative response, including peroxiredoxins, m-type thioredoxins, and a lumenal ascorbate peroxidase. Characteristics of the experimentally identified lumenal proteins and their orthologs were used for a genome-wide prediction of the lumenal proteome. Lumenal proteins with a typical twin-arginine translocation motif were predicted with good accuracy and sensitivity and included additional isomerases and proteases. Thus, prime functions of the lumenal proteome include assistance in the folding and proteolysis of thylakoid proteins as well as protection against oxidative stress. Many of the predicted lumenal proteins must be present at concentrations at least 10,000-fold lower than proteins of the photosynthetic apparatus.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Silver-Stained 2-D Electrophoresis Maps of Lumenal Proteins of Arabidopsis (ecotype Columbia). Fractions strongly enriched for thylakoid lumenal proteins isolated from Arabidopsis were separated by 2-D electrophoresis with denaturing isoelectric focusing (IEF) on immobilized pH gradients in the first dimension between pH 4 and 7 (A) or between pH 7 and 11 (B) and Tricine-PAGE in the second dimension ([A] and [B]). Gels were calibrated for molecular mass (in kD) and pI (in pH units) by internal (pH and mass) and external (mass) standards. The protein spot numbers refer to the spot numbers listed in Tables 1 to 4. For a selected number of spots, the identity (in addition to the number) also has been listed on the 2-D maps. The map in (B) also includes the internal standards for the pI, as indicated by the arrow. To more easily compare the Arabidopsis maps with those constructed for pea (as described by Peltier et al., 2000), spot numbers are identical for orthologous pairs. ACP, acyl carrier protein; FNR, ferredoxin-NADP reductase; GST, glutathione S-transferase; PSII, photosystem II; SOD, superoxide dismutase.
Figure 2.
Figure 2.
Relative Expression Levels of Thylakoid Proteins in Pea and Arabidopsis. Quantification of expression levels of lumenal and other thylakoid proteins from Arabidopsis proteins (closed symbols) and pea proteins (open symbols) calculated on a molar basis and normalized to the expression level of Hcf136. Duplicate Coomassie blue–stained and duplo silver-stained 2-D gels of two independent experiments with a pH range of 4 to 7 were analyzed. Standard errors (n = 4) are indicated, and the x axis is in log scale.
Figure 3.
Figure 3.
Sequence Analysis of Pairs of Isoforms/Paralogs Present in the Thylakoid Lumen or Associated with the Thylakoid Membrane of Arabidopsis Chloroplasts. Alignments of two pairs of paralogs of thylakoid proteins identified on the 2-D gels shown in Figure 1. The cleavage sites of the lTP are indicated by the arrows. (A) OEC16 isoforms in spots 203 and 208. (B) Plastocyanin (PC) isoforms in spots 116 and 117.
Figure 4.
Figure 4.
The Phylogenetic Relationship between Members of the Lumenal Isomerase Family. The rooted tree of lumenal isomerases in spots 79, 80, 95, 110, 210, and 211, four predicted isomerases, and two isomerases found in Swiss-Prot (marked with *). The tree was built in Phylip using both parsimony and distance methods on protein sequences and reflects consensus trees with branches supported by the highest possible bootstrap values. The rooting was done with an outgroup sequence from the moss Physcomitrella and is an ortholog of spot 210. The location of Spot 79 is ambiguous (#).
Figure 5.
Figure 5.
Correction of Gene Annotation and Other Sequence Analysis of Arabidopsis Proteins Identified on the 2-D Gels in Figure 1. (A) Spot 71, identified as a 17.2-kD protein at pI 6.0 on the 2-D gel, was not annotated in the Arabidopsis genome, and no overlapping EST could be found. However, a homolog could be reconstructed in soybean using three overlapping ESTs. Using the reconstructed cDNA from soybean, the corresponding gene could be identified in the Arabidopsis genome on chromosome II. Three internal sequences determined by nano-ESI/MS/MS matched sequences in the gene and are indicated by boxes. Analysis with the functional domain predictor Pfam indicated that the protein in spot 71 belongs to the OEC23 family. The predicted lTP or cTP cleavage sites are indicated with arrows. Amino acid residues conserved among all three sequences are shown in red, and those conserved between two sequences are shown in blue. RR and KR motifs in the lTP are boxed. (B) Spot 83 is a fibrillin (on chromosome II) and is an example of a serious misassignment in intron/exon boundaries. The misassignments were corrected by matching of the genomic sequence with a homologous fibrillin (verified entirely by overlapping ESTs) on chromosome III in Arabidopsis (T10K17.220) and verified by matching protein sequences obtained by nano-ESI/MS/MS; these sequence tags are boxed. To arrive at the correct amino acid sequence, amino acids shown in red need to be removed from the annotated sequence, and amino acids shown in blue need to be included. Fifty percent of the protein sequence was changed compared with the predicted sequence. (C) The annotated genome sequence of a FKBP (accession number 22989010) identified in spot 110 has an incorrectly predicted N terminus (MLLVL…). Orthologs in tomato (1) (AW041520), barley (2) (11193249), alfalfa (3) (11902372), and C. reinhardtii (4) (AV624465) all show typical bipartite presequences with typical lumenal TAT motifs. The N terminus of the protein in spot 110 could be extended with one overlapping EST (T76027); however, no EST was found for the very N-terminal end. Amino acid residues conserved among all four sequences are shown in red, and those conserved between fewer than four sequences are shown in blue. The cleavage site for the lTP is indicated with an arrow. (D) Alignment of the Arabidopsis protein sequence of spot 104 with its homologs in seven other plant species. Spot 104 has a typical TAT motif, and in vitro analysis has shown that this protein translocated via the TAT pathway (Mant et al., 1999). Interestingly, the first arginine residue of the twin arginines is replaced by a lysine residue in four of the orthologs. Orthologs are from potato (1) (10447846), tomato (2) (5900060), soybean (3) (7795408), alfalfa (4) (11900582), Mesembryanthemum crystallinum (5) (8330419), barley (6) (11198348), and Physcomitrella patens (7) (6102372). Amino acid residues conserved among all seven sequences are shown in red, and those conserved between fewer than seven sequences are shown in blue. The cleavage site for the lTP is indicated with an arrow.
Figure 6.
Figure 6.
Cross-Correlation between Experimental and Theoretical Molecular Masses and pI Values of Precursors and Mature Proteins in Arabidopsis. Cross-correlation of predicted and experimental molecular mass ([A] and [C]) and pI values ([B] and [D]) of the proteins from Arabidopsis identified in Figure 1 before ([A] and [B]) and after ([C] and [D]) removal of the cTP and lTP. Dotted lines indicate perfect correlations. Protein spot numbers are indicated for strongly deviated points. The circled data points correspond to spot 103 (see Figure 7). Open symbols represent values based on incorrect annotation (e.g., 103A, 106A, and 108A) ([C] and [D]). Three (most likely) monodimers (spots 116, 117, and 204) are indicated in (C).
Figure 7.
Figure 7.
Alternative Splicing of Lumenal Protein Spot 103. The gene annotation for protein spot 103 is incorrect in the database. The top of the figure shows a scheme of the genomic sequence and the positions of the exons. Based on overlapping ESTs, it is clear that the gene annotation in the database (sequence A) is incorrect. Exon IV was not recognized as an exon, and exon V is too short and was frameshifted (indicated in black). One full-length cDNA and corresponding proteins (sequence B) can be reconstructed from the overlapping ESTs. Sequence B is constructed from six exons (I, II, III, IV, V, V′, and VI) and encodes a protein with pI value of 9.35 for the precursor and 9.14 for the mature protein. Sequence C is constructed from exons I to V′ plus exons VI, VII, and VIII and encodes a protein with a pI value of 7.55 for the precursor and 5.30 for the mature protein. The alternative splice site (in sequence C) occurs in the middle of exon VI. Two ESTs were used to reconstruct the protein in sequence C. The pI value and molecular mass of the processed protein match exactly the experimental coordinates on the 2-D gel (Figure 1A). Ten peptide masses determined by MALDI-TOF MS match this protein (mass accuracy within 50 ppm). Red indicates conservation, and blue or green indicates no conservation of the sequence between the different annotations A, B, and C. Matching Arabidopsis ESTs are indicated. The lumenal processing site is indicated by the arrows.
Figure 8.
Figure 8.
Cross-Correlation of the Experimental Molecular Mass (A) and pI Values (B) of the Lumenal Proteins from Pea and Arabidopsis Identified on the 2-D Maps. The broken lines indicate a deviation of 10 kD or 0.75 pI units. Protein spot numbers are indicated for strongly deviated points. The circled data points correspond to spot 103 (see Figure 7).
Figure 8.
Figure 8.
Cross-Correlation of the Experimental Molecular Mass (A) and pI Values (B) of the Lumenal Proteins from Pea and Arabidopsis Identified on the 2-D Maps. The broken lines indicate a deviation of 10 kD or 0.75 pI units. Protein spot numbers are indicated for strongly deviated points. The circled data points correspond to spot 103 (see Figure 7).
Figure 9.
Figure 9.
Sequence Analysis of cTP and lTP of the Experimentally Identified Proteins in Arabidopsis and Their Homologs/Orthologs in Other Plant Species. (A) and (B) Logoplot of 109 lumenal proteins with a typical TAT motif identified experimentally in Arabidopsis and their homologs in other plant species aligned according to the experimentally identified N terminus (A) and RR motif (B). (C) Logoplot of 92 lumenal proteins without a typical TAT motif identified experimentally in Arabidopsis and their orthologs in other plant species aligned according to the experimentally identified N terminus. (D) Length distribution of cTP for the experimental training set of TargetP (141 proteins) (closed bars) and the length distribution of cTP + lTP (open bars) for the 174 experimentally identified lumenal proteins. Proteins are divided into classes of 10 amino acid residues.
Figure 9.
Figure 9.
Sequence Analysis of cTP and lTP of the Experimentally Identified Proteins in Arabidopsis and Their Homologs/Orthologs in Other Plant Species. (A) and (B) Logoplot of 109 lumenal proteins with a typical TAT motif identified experimentally in Arabidopsis and their homologs in other plant species aligned according to the experimentally identified N terminus (A) and RR motif (B). (C) Logoplot of 92 lumenal proteins without a typical TAT motif identified experimentally in Arabidopsis and their orthologs in other plant species aligned according to the experimentally identified N terminus. (D) Length distribution of cTP for the experimental training set of TargetP (141 proteins) (closed bars) and the length distribution of cTP + lTP (open bars) for the 174 experimentally identified lumenal proteins. Proteins are divided into classes of 10 amino acid residues.
Figure 10.
Figure 10.
Genome-Wide Prediction of the Lumenal Proteins with a Typical TAT Motif in the Genome of Arabidopsis. (A) Summarizing scheme of the genome-wide prediction of lumenal proteins with a TAT signal. Step I. 25,460 Arabidopsis open reading frames (see Methods) were processed through TargetP, resulting in 3646 protein sequences predicted to have a cTP (Result 1). Step II. For each of the 3646 proteins, 20, 25, … 80 residues were removed from the N terminus to mimic the cleavage of the cTP. (Thus, each protein was present in 13 differently truncated versions.) Step III. The truncated proteins were processed through SignalP (both Gram-negative and Gram-positive versions) to predict the potential presence of a lTP. If any of the 13 truncated versions of a protein were predicted to contain a lTP by at least one of the two SignalP versions, the protein was kept in Result 3 (1224 proteins). Step IV. The proteins predicted to have a lTP were checked for the presence of the four versions of the −3,−1 motif (p, s, a, and r; see Results) at the lTP cleavage site. Step V. Length restrictions were imposed. Proteins that did not meet the length criteria were excluded (Result 5; 596 proteins using the p motif). Step VI. The remaining proteins were processed through the TM region predictor TMHMM. All proteins that contained one or more TM regions in the predicted mature part (i.e., C terminal of the predicted lTP cleavage site) were removed. Step VII. Proteins with a TAT pathway motif (twin arginine; RR) in region −32 to −18 relative to the predicted lTP cleavage site were sorted to the TAT lumenal protein set (93 proteins using lTP cleavage site motif p). The rest of the proteins were kept in the “other lumenal proteins” set (380 proteins using the p motif). (B) Functional catalog of the 71 predicted lumenal proteins with a TAT motif based on known function or functional domain prediction.
Figure 10.
Figure 10.
Genome-Wide Prediction of the Lumenal Proteins with a Typical TAT Motif in the Genome of Arabidopsis. (A) Summarizing scheme of the genome-wide prediction of lumenal proteins with a TAT signal. Step I. 25,460 Arabidopsis open reading frames (see Methods) were processed through TargetP, resulting in 3646 protein sequences predicted to have a cTP (Result 1). Step II. For each of the 3646 proteins, 20, 25, … 80 residues were removed from the N terminus to mimic the cleavage of the cTP. (Thus, each protein was present in 13 differently truncated versions.) Step III. The truncated proteins were processed through SignalP (both Gram-negative and Gram-positive versions) to predict the potential presence of a lTP. If any of the 13 truncated versions of a protein were predicted to contain a lTP by at least one of the two SignalP versions, the protein was kept in Result 3 (1224 proteins). Step IV. The proteins predicted to have a lTP were checked for the presence of the four versions of the −3,−1 motif (p, s, a, and r; see Results) at the lTP cleavage site. Step V. Length restrictions were imposed. Proteins that did not meet the length criteria were excluded (Result 5; 596 proteins using the p motif). Step VI. The remaining proteins were processed through the TM region predictor TMHMM. All proteins that contained one or more TM regions in the predicted mature part (i.e., C terminal of the predicted lTP cleavage site) were removed. Step VII. Proteins with a TAT pathway motif (twin arginine; RR) in region −32 to −18 relative to the predicted lTP cleavage site were sorted to the TAT lumenal protein set (93 proteins using lTP cleavage site motif p). The rest of the proteins were kept in the “other lumenal proteins” set (380 proteins using the p motif). (B) Functional catalog of the 71 predicted lumenal proteins with a TAT motif based on known function or functional domain prediction.

References

    1. Adam, Z., Adamska, I., Nakabayashi, K., Ostersetzer, O., Haussuhl, K., Manuell, A., Zheng, B., Vallon, O., Rodermel, S.R., Shinozaki, K., and Clarke, A.K. (2001). Chloroplast and mitochondrial proteases in Arabidopsis: A proposed nomenclature. Plant Physiol. 125, 1912–1918. - PMC - PubMed
    1. Arabidopsis Genome Initiative. (2000). Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408, 796–815. - PubMed
    1. Baier, M., and Dietz, K.J. (1999). Protective function of chloroplast 2-cysteine peroxiredoxin in photosynthesis: Evidence from transgenic Arabidopsis. Plant Physiol. 119, 1407–1414. - PMC - PubMed
    1. Baier, M., Noctor, G., Foyer, C.H., and Dietz, K.J. (2000). Antisense suppression of 2-cysteine peroxiredoxin in Arabidopsis specifically enhances the activities and expression of enzymes associated with ascorbate metabolism but not glutathione metabolism. Plant Physiol. 124, 823–832. - PMC - PubMed
    1. Bassham, D.C., Creighton, A.M., Karnauchov, I., Herrmann, R.G., Klosgen, R.B., and Robinson, C. (1994). Mutations at the stromal processing peptidase cleavage site of a thylakoid lumen protein precursor affect the rate of processing but not the fidelity. J. Biol. Chem. 269, 16062–16066. - PubMed

Publication types

MeSH terms

Associated data