Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jul;31(7):1697-709.
doi: 10.1093/molbev/msu105. Epub 2014 May 2.

Compositional biases among synonymous substitutions cause conflict between gene and protein trees for plastid origins

Affiliations

Compositional biases among synonymous substitutions cause conflict between gene and protein trees for plastid origins

Blaise Li et al. Mol Biol Evol. 2014 Jul.

Abstract

Archaeplastida (=Kingdom Plantae) are primary plastid-bearing organisms that evolved via the endosymbiotic association of a heterotrophic eukaryote host cell and a cyanobacterial endosymbiont approximately 1,400 Ma. Here, we present analyses of cyanobacterial and plastid genomes that show strongly conflicting phylogenies based on 75 plastid (or nuclear plastid-targeted) protein-coding genes and their direct translations to proteins. The conflict between genes and proteins is largely robust to the use of sophisticated data- and tree-heterogeneous composition models. However, by using nucleotide ambiguity codes to eliminate synonymous substitutions due to codon-degeneracy, we identify a composition bias, and dependent codon-usage bias, resulting from synonymous substitutions at all third codon positions and first codon positions of leucine and arginine, as the main cause for the conflicting phylogenetic signals. We argue that the protein-coding gene data analyses are likely misleading due to artifacts induced by convergent composition biases at first codon positions of leucine and arginine and at all third codon positions. Our analyses corroborate previous studies based on gene sequence analysis that suggest Cyanobacteria evolved by the early paraphyletic splitting of Gloeobacter and a specific Synechococcus strain (JA33Ab), with all other remaining cyanobacterial groups, including both unicellular and filamentous species, forming the sister-group to the Archaeplastida lineage. In addition, our analyses using better-fitting models suggest (but without statistically strong support) an early divergence of Glaucophyta within Archaeplastida, with the Rhodophyta (red algae), and Viridiplantae (green algae and land plants) forming a separate lineage.

Keywords: Archaeplastida; Cyanobacteria; origin of plastids; phylogeny.

PubMed Disclaimer

Figures

F<sc>ig</sc>. 1.
Fig. 1.
ML bootstrap analysis of the protein-coding gene data set “cg75” and 50% majority-rule consensus tree of 200 ML (formula image) bootstrap trees. Values above the branches are BPs. Colors indicate taxonomic groups (supplementary table S1, Supplementary Material online): Bacteria (purple), Cyanobacteria (blue), Glaucophyta (orange), Rhodophyta (red), and Viridiplantae (green). Note that Prochlorococcus is attracted to the Archaeplastida clade causing lower support values between the two points of attachment.
F<sc>ig</sc>. 2.
Fig. 2.
ML bootstrap analysis of the protein data set “cp75” and 50% majority-rule consensus tree of 200 ML (formula image) bootstrap trees. Values above branches are BPs. Colors indicate taxonomic group (refer legend of fig. 1).
F<sc>ig</sc>. 3.
Fig. 3.
Simplified ML bootstrap tree for the recoded protein-coding gene data set “cg75_degen12S” and 50% majority-rule consensus tree of 200 ML (formula image) bootstrap trees. Clades are labeled by their group label were possible. The codon usage bias and formula image proportions at the three codon positions of the original “cg75” data set (i.e., without recoding) are presented to the right of the taxa (average values are given for summarized groups). This tree was chosen to display codon usage biases and G + C proportions because it seems to exemplify reconstruction errors induced by compositional effects. The topology of this tree somewhat correlates with composition and codon usages biases. Codon usage bias among Leu, Ser, and Arg is measured as the formula image of the unbiased ratio between the usage of the two families of codons where the number of occurrences of codons of a family is divided by the number of possible codons in that family (2 or 4). Codon family labels: formula image, formula image; formula image, formula image; formula image, formula image; formula image, formula image; formula image, formula image; and formula image, formula image. The codon bias representation is inspired by figure 1 of Inagaki and Roger (2006). Values above branches are BPs. Colors indicate taxonomic group (refer legend of fig. 1). *Prochlorococcus is an abbreviation of Prochlorococcus marinus (SO-6).

References

    1. Abascal F, Zardoya R, Posada D. ProtTest: selection of best-fit models of protein evolution. Bioinformatics. 2005;21(9):2104–2105. - PubMed
    1. Abascal F, Zardoya R, Telford MJ. TranslatorX: multiple alignment of nucleotide sequences guided by amino acid translations. Nucleic Acids Res. 2010;38(2 Suppl):W7–W13. - PMC - PubMed
    1. Aitken A, Stanier RY. Characterization of peptidoglycan from the cyanelles of Cyanophora paradoxa. J Gen Microbiol. 1979;112(2):219–223.
    1. Akashi H, Kliman RM, Eyre-Walker A. Mutation pressure, natural selection, and the evolution of base composition in Drosophila. Genetica. 1998 102/103:49–60. - PubMed
    1. Averof M, Rokas A, Wolfe KH, Sharp PM. Evidence for a high-frequency of simultaneous double-nucleotide substitutions. Science. 2000;287(5456):1283–1386. - PubMed

Publication types

LinkOut - more resources