Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012;7(11):e47450.
doi: 10.1371/journal.pone.0047450. Epub 2012 Nov 20.

Resolving discrepancy between nucleotides and amino acids in deep-level arthropod phylogenomics: differentiating serine codons in 21-amino-acid models

Affiliations

Resolving discrepancy between nucleotides and amino acids in deep-level arthropod phylogenomics: differentiating serine codons in 21-amino-acid models

Andreas Zwick et al. PLoS One. 2012.

Abstract

Background: In a previous study of higher-level arthropod phylogeny, analyses of nucleotide sequences from 62 protein-coding nuclear genes for 80 panarthopod species yielded significantly higher bootstrap support for selected nodes than did amino acids. This study investigates the cause of that discrepancy.

Methodology/principal findings: The hypothesis is tested that failure to distinguish the serine residues encoded by two disjunct clusters of codons (TCN, AGY) in amino acid analyses leads to this discrepancy. In one test, the two clusters of serine codons (Ser1, Ser2) are conceptually translated as separate amino acids. Analysis of the resulting 21-amino-acid data matrix shows striking increases in bootstrap support, in some cases matching that in nucleotide analyses. In a second approach, nucleotide and 20-amino-acid data sets are artificially altered through targeted deletions, modifications, and replacements, revealing the pivotal contributions of distinct Ser1 and Ser2 codons. We confirm that previous methods of coding nonsynonymous nucleotide change are robust and computationally efficient by introducing two new degeneracy coding methods. We demonstrate for degeneracy coding that neither compositional heterogeneity at the level of nucleotides nor codon usage bias between Ser1 and Ser2 clusters of codons (or their separately coded amino acids) is a major source of non-phylogenetic signal.

Conclusions: The incongruity in support between amino-acid and nucleotide analyses of the forementioned arthropod data set is resolved by showing that "standard" 20-amino-acid analyses yield lower node support specifically when serine provides crucial signal. Separate coding of Ser1 and Ser2 residues yields support commensurate with that found by degenerated nucleotides, without introducing phylogenetic artifacts. While exclusion of all serine data leads to reduced support for serine-sensitive nodes, these nodes are still recovered in the ML topology, indicating that the enhanced signal from Ser1 and Ser2 is not qualitatively different from that of the other amino acids.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Deep-level arthropod relationships based on six analytical approaches.
Aligned sequences from 75 arthropods and five outgroup species for 62 nuclear protein-coding genes were analyzed under the likelihood criterion using six strategies: 20AA-JTT, a 20-amino-acid JTT model ; 21AA-JTT, a 21-amino-acid JTT model; codon, a codon model; degen1; degen8; noLRall1nt2. These strategies are described in the Data Set Encoding section of Materials and Methods and in , . Numbers of species representing terminal taxa are in parentheses. Bootstrap percentages (BP) are on internal branches (20AA, 21AA, codon, degen1, degen8, and noLRall1nt2; see figure key for order). Six nodes with a major increase in their bootstrap support from 20AA JTT to 21AA JTT are identified with filled circles. A more complete listing of results can be found in Table S1.
Figure 2
Figure 2. Plot of average ECM codon substitution rates for synonymous, intra-serine (S/Z), and the most frequent nonsynonymous substitutions.
Individual codon rates are summarized through averaging for each respective amino acid (synonymous) or change between amino acids (synonymous SER (S/Z), nonsynonymous).
Figure 3
Figure 3. Compositional distance trees (Euclidean distances) for six data sets – nucleotide composition for nt123 data set, degenerated nucleotide composition for degen1 data sets with and without serine, codon composition for codon data set, and amino acid composition for 20AA and 21AA data sets.
Bootstrap percentages >50% are displayed and indicate the strength of the compositional signal at particular nodes. The sum of all branch lengths reflects the total amount of compositional heterogeneity in the data set.
Figure 4
Figure 4. Compositional distance tree (Euclidean distances) based on the codon composition of a data set that is restricted to co-Ser codons.
Bootstrap percentages >50% are displayed and indicate the strength of the compositional signal at particular nodes. The sum of all branch lengths reflects the total amount of compositional heterogeneity in the data set.
Figure 5
Figure 5. Summary of the six key nodes that are recovered in all maximum likelihood topologies from degen1 analyses of five nucleotide data sets with and without modifications (including deletions) of serine codons, along with their bootstrap values.
The complete topologies are condensed to illustrate that all six higher-level nodes under investigation are recovered by each of five data sets: 1. Ser1→ Ser2 data set, in which Ser1 codons (TCN) in the degen1 data set are artificially changed to Ser2 (AGY); 2. Ser2→ Ser1 data set, in which all Ser2 codons in the degen1 data set are artificially changed to Ser1; 3. noSer1noSer2, in which all Ser1 and Ser2 codons in the degen1 data set are artificially changed to NNN; 4. no change to Ser, in which the degen1 data set is analyzed as is; 5. degenFS2, in which all Phe (TTY) and Ser2 (AGY) codons in the degen1 data set are artificially changed to NNN.

Similar articles

Cited by

References

    1. Lockhart PJ, Howe CJ, Bryant DA, Beanland TJ, Larkum AWD (1992) Substitutional bias confounds inference of cyanelle origins from sequence data. J Mol Evol 34: 153–162. - PubMed
    1. Gruber KF, Voss RS, Jansa SA (2007) Base-compositional heterogeneity in the RAG1 locus among didelphid marsupials: Implications for phylogenetic inference and the evolution of GC content. Syst Biol 56: 83–96. - PubMed
    1. Song H, Sheffield NC, Cameron SL, Miller KB, Whiting MF (2010) When phylogenetic assumptions are violated: Base compositional heterogeneity and among-site rate variation in beetle mitochondrial phylogenomics. Syst Ent 39: 429–448.
    1. Regier JC, Zwick A (2011) Sources of signal in 62 protein-coding nuclear genes for higher-level phylogenetics of arthropods. PLoS ONE 6: e23408. - PMC - PubMed
    1. Regier JC, Shultz JW, Zwick A, Hussey A, Ball B, et al. (2010) Arthropod relationships revealed by phylogenomic analysis of nuclear protein-coding sequences. Nature 463: 1079–1083. - PubMed

Publication types