Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2002 Sep 17;99(19):12246-51.
doi: 10.1073/pnas.182432999. Epub 2002 Sep 6.

Evolutionary analysis of Arabidopsis, cyanobacterial, and chloroplast genomes reveals plastid phylogeny and thousands of cyanobacterial genes in the nucleus

Affiliations

Evolutionary analysis of Arabidopsis, cyanobacterial, and chloroplast genomes reveals plastid phylogeny and thousands of cyanobacterial genes in the nucleus

William Martin et al. Proc Natl Acad Sci U S A. .

Abstract

Chloroplasts were once free-living cyanobacteria that became endosymbionts, but the genomes of contemporary plastids encode only approximately 5-10% as many genes as those of their free-living cousins, indicating that many genes were either lost from plastids or transferred to the nucleus during the course of plant evolution. Previous estimates have suggested that between 800 and perhaps as many as 2,000 genes in the Arabidopsis genome might come from cyanobacteria, but genome-wide phylogenetic surveys that could provide direct estimates of this number are lacking. We compared 24,990 proteins encoded in the Arabidopsis genome to the proteins from three cyanobacterial genomes, 16 other prokaryotic reference genomes, and yeast. Of 9,368 Arabidopsis proteins sufficiently conserved for primary sequence comparison, 866 detected homologues only among cyanobacteria and 834 other branched with cyanobacterial homologues in phylogenetic trees. Extrapolating from these conserved proteins to the whole genome, the data suggest that approximately 4,500 of Arabidopsis protein-coding genes ( approximately 18% of the total) were acquired from the cyanobacterial ancestor of plastids. These proteins encompass all functional classes, and the majority of them are targeted to cell compartments other than the chloroplast. Analysis of 15 sequenced chloroplast genomes revealed 117 nuclear-encoded proteins that are also still present in at least one chloroplast genome. A phylogeny of chloroplast genomes inferred from 41 proteins and 8,303 amino acids sites indicates that at least two independent secondary endosymbiotic events have occurred involving red algae and that amino acid composition bias in chloroplast proteins strongly affects plastid genome phylogeny.

PubMed Disclaimer

Figures

Fig 1.
Fig 1.
Similarity of 24,990 Arabidopsis proteins to 51,361 proteins from 20 reference genomes (two Mycoplasma genome sequences are treated as one species). Gray columns: number of times that the genome gave the best match against Arabidopsis when blast was used (23) at four E value thresholds, from left to right 10−40, 10−20, 10−10, and 10−4. The number of times that a homologue from the genome occurred in any tree is indicated (top number above columns). Black columns indicate the number of times that proteins from the genome indicated gave a common branch with the Arabidopsis homologue in protml (25) analyses using the JTT-F matrix (middle number), white columns therein indicate the number of those trees in which the branch was supported at BP ≥ 0.95 (bottom number).
Fig 2.
Fig 2.
Freqency distribution of protml results vs. protein variability, expressed as protml tree length in substitutions per site per taxon [dt⋅OTU−1] (abcissa). Highly conserved proteins are at the left, highly variable proteins at the right. Bin intervals of 0.1 were used except the last interval, which contains all trees with dt⋅OTU−1 > 0.8 (plotted at abcissa mean). Squares indicate the number of trees per interval (left ordinate). Circles indicate the proportion of trees per interval (right ordinate) that yield an (Arabi,cyano) branch. Triangles indicate the proportion of trees per interval that do not. Equivocal trees were excluded.
Fig 3.
Fig 3.
Targeting predictions for 3,628 Arabidopsis proteins examined. Columns indicate the number of genes predicted to be targeted to the compartment shown at five significance thresholds (27). Dark bars (left) indicate the highest threshold, light bars (right) indicate the lowest significance threshold.
Fig 4.
Fig 4.
Phylogeny of chloroplast genomes, gene loss, and gene transfer. (A) Topology preferred by NJ and LD for chloroplast genomes. Branch lengths were estimated with ML using the JTT-F matrix. 1° and 2° endosymbiotic events are indicated. Gene losses inferred at branches are indicated with arrows, designating numbered blocks of genes, which are expanded as gene lists at left and bottom. Genes for which a transferred nuclear homologue was found are underlined. Gene presence matrix and accession numbers are given in Table 5. Numbers of parallel losses are color-coded. Support for branches (lowercase letters), is given in Table 2. (B) Alternative topolgies T2–T8 detected in various subsets of the data and with various methods. Dotted lines indicate that the topology is otherwise identical to T1.

Comment in

Similar articles

Cited by

References

    1. Goksøyr J. (1967) Nature (London) 214, 1161. - PubMed
    1. Douglas S. E. (1998) Curr. Opin. Gen. Dev. 8, 655-661. - PubMed
    1. Delwiche C. W. (1999) Am. Nat. 154, S164-S177. - PubMed
    1. Tomitani A., Okada, K., Miyashita, H., Matthijs, H. C. P., Ohno, T. & Tanaka, A. (1999) Nature (London) 400, 159-162. - PubMed
    1. Herrmann R. G. (1997) in Eukaryotism and Symbiosis, eds. Schenk, H. E. A., Herrmann, R. G., Jeon, K. W. & Schwemmler, W. (Springer, Heidelberg), pp. 73–118.

Publication types

Substances