Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Jun 17;286(24):21427-39.
doi: 10.1074/jbc.M111.233734. Epub 2011 Apr 22.

The GreenCut2 resource, a phylogenomically derived inventory of proteins specific to the plant lineage

Affiliations

The GreenCut2 resource, a phylogenomically derived inventory of proteins specific to the plant lineage

Steven J Karpowicz et al. J Biol Chem. .

Abstract

The plastid is a defining structure of photosynthetic eukaryotes and houses many plant-specific processes, including the light reactions, carbon fixation, pigment synthesis, and other primary metabolic processes. Identifying proteins associated with catalytic, structural, and regulatory functions that are unique to plastid-containing organisms is necessary to fully define the scope of plant biochemistry. Here, we performed phylogenomics on 20 genomes to compile a new inventory of 597 nucleus-encoded proteins conserved in plants and green algae but not in non-photosynthetic organisms. 286 of these proteins are of known function, whereas 311 are not characterized. This inventory was validated as applicable and relevant to diverse photosynthetic eukaryotes using an additional eight genomes from distantly related plants (including Micromonas, Selaginella, and soybean). Manual curation of the known proteins in the inventory established its importance to plastid biochemistry. To predict functions for the 52% of proteins of unknown function, we used sequence motifs, subcellular localization, co-expression analysis, and RNA abundance data. We demonstrate that 18% of the proteins in the inventory have functions outside the plastid and/or beyond green tissues. Although 32% of proteins in the inventory have homologs in all cyanobacteria, unexpectedly, 30% are eukaryote-specific. Finally, 8% of the proteins of unknown function share no similarity to any characterized protein and are plant lineage-specific. We present this annotated inventory of 597 proteins as a resource for functional analyses of plant-specific biochemistry.

PubMed Disclaimer

Figures

FIGURE 1.
FIGURE 1.
Taxonomic tree of organisms used to build and test GreenCut2. Eight photosynthetic organisms (green) were used in the construction of the GreenCut2. All eight organisms must encode orthologs of a protein for the protein to be included in the GreenCut2 except for the Ostreococcus species where an ortholog is only required to be in one of the three species. Conserved proteins encoded by any of the nine non-photosynthetic organisms (red) were excluded from the GreenCut2. A subset of GreenCut2 orthologs was identified in the three non-green, photosynthetic eukaryotes (purple). The genomes of other eukaryotes (black) were searched for orthologs of GreenCut2 proteins as validation of the inventory. Cyanobacteria (blue) were searched for homologs of GreenCut2 proteins. Organisms that contribute proteins to a subset of the GreenCut2 are bounded by a gray line to the right of the tree. The general taxonomic group to which an organism belongs is shown on the tree. The unrooted taxonomic tree represents evolutionary relationships between organisms but not evolutionary distance. Asterisks indicate those organisms whose genomes were used to determine GreenCut version 1.
FIGURE 2.
FIGURE 2.
Functional distribution of GreenCut2 proteins. A stacked bar chart shows the numbers of proteins of known (filled gray) and unknown (unfilled) function assigned to each functional category for all GreenCut2 proteins (A), only the PlastidCut2 proteins (B), and only the ViridiCut2 proteins (C). Assignments to a functional category were made using the Arabidopsis MapMan ontology of known proteins or Pfam domain predictions for unknown proteins. The number of proteins in a category is shown in each bar. The x axes have been set so that the length of bars may be compared between panels. Protein Metabolism, protein maturation and degradation; Nucleic Acid, nucleic acid binding, modification, and transcription factors; Other, domain or motif to suggest a general function but not a specific functional category; Photosynthesis, photosynthetic apparatus and carbon fixation; Transport, protein and small molecule trafficking and transport; Redox, electron carriers and reduction/oxidation enzymes; Pigment, chlorophyll and carotenoid metabolism; Signaling, signal transduction; Lipid, lipid metabolism; Carbohydrate, starch and sugar metabolism; Cell Cycle, cell cycle and division; Co-Factor, cofactor metabolism; No Prediction, no informative motif or domain; Uninformative, domain of unknown function or structural motif that does not suggest a function.
FIGURE 3.
FIGURE 3.
Expression pattern of GreenCut2 genes in Arabidopsis organs. Signal intensities from AtGenExpress developmental microarrays (46) were used to cluster Arabidopsis genes encoding GreenCut2 orthologs and co-orthologs into tissue expression categories based on high transcript abundance in one organ relative to other organs. The values do not add up to 100% because 50 of the 710 transcripts (7%) encoding the Arabidopsis GreenCut2 (co)-orthologs do not have associated probes on the Affymetrix ATH1 microarray chip.
FIGURE 4.
FIGURE 4.
GreenCut2 transcript abundance distribution in Chlamydomonas cells and Arabidopsis organs. A and B, distribution of mRNA abundances from Chlamydomonas strain CC-1021 grown in Tris phosphate medium with CO2 as a carbon source (A) or Tris acetate phosphate medium with acetate as a carbon source (B) (47). Transcripts from 597 genes encoding GreenCut2 proteins were binned by abundance, which is presented in RPKM values. Closed red circles represent encoded proteins of known function. Open black circles represent encoded proteins of unknown function. The medians of the known (solid vertical red line) and unknown (dashed vertical black line) transcripts are displayed with the corresponding median value. A polynomial best fit line to the distribution of transcript abundances is presented for known transcripts (solid red) and unknown transcripts (dashed black). C and D, distribution of mRNA abundances from Arabidopsis shoots (C) or roots (D) (48). Transcripts from 710 genes encoding GreenCut2 orthologs and co-orthologs were grouped into bins based on abundance.
FIGURE 5.
FIGURE 5.
Conservation of GreenCut2 proteins in cyanobacteria. The amino acid sequences of the Arabidopsis GreenCut2 orthologs were used as queries in BLASTP searches against 37 cyanobacterial genomes. Best hit results with E-values <1e−4 were considered to be homologs. Proteins with known function are shown as gray columns, whereas proteins of unknown function are shown as stacked white columns. The number of proteins in each bin is shown.

References

    1. Knoll A. H. (1992) Science 256, 622–627 - PubMed
    1. Yoon H. S., Hackett J. D., Ciniglia C., Pinto G., Bhattacharya D. (2004) Mol. Biol. Evol. 21, 809–818 - PubMed
    1. Gross J., Bhattacharya D. (2009) Nat. Rev. Genet. 10, 495–505 - PubMed
    1. Jarvis P. (2008) New Phytol. 179, 257–285 - PubMed
    1. Li H. M., Chiu C. C. (2010) Annu. Rev. Plant Biol. 61, 157–180 - PubMed

Publication types