Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013;5(11):2082-92.
doi: 10.1093/gbe/evt157.

Phylotranscriptomics: saturated third codon positions radically influence the estimation of trees based on next-gen data

Affiliations

Phylotranscriptomics: saturated third codon positions radically influence the estimation of trees based on next-gen data

Jesse W Breinholt et al. Genome Biol Evol. 2013.

Abstract

Recent advancements in molecular sequencing techniques have led to a surge in the number of phylogenetic studies that incorporate large amounts of genetic data. We test the assumption that analyzing large number of genes will lead to improvements in tree resolution and branch support using moths in the superfamily Bombycoidea, a group with some interfamilial relationships that have been difficult to resolve. Specifically, we use a next-gen data set that included 19 taxa and 938 genes (∼1.2M bp) to examine how codon position and saturation might influence resolution and node support among three key families. Maximum likelihood, parsimony, and species tree analysis using gene tree parsimony, on different nucleotide and amino acid data sets, resulted in largely congruent topologies with high bootstrap support compared with prior studies that included fewer loci. However, for a few shallow nodes, nucleotide and amino acid data provided high support for conflicting relationships. The third codon position was saturated and phylogenetic analysis of this position alone supported a completely different, potentially misleading sister group relationship. We used the program RADICAL to assess the number of genes needed to fix some of these difficult nodes. One such node originally needed a total of 850 genes but only required 250 when synonymous signal was removed. Our study shows that, in order to effectively use next-gen data to correctly resolve difficult phylogenetic relationships, it is necessary to assess the effects of synonymous substitutions and third codon positions.

Keywords: Bombycoidea; Lepidoptera; phylogeny; saturation; synonymous substitutions; transcriptome.

PubMed Disclaimer

Figures

F<sc>ig</sc>. 1.—
Fig. 1.—
Published phylogenetic trees showing relationships among the families Bombycidae, Saturniidae, and Sphingidae. A dash (-) indicates branch support <60% bootstrap or <0.6 posterior probability. (A) Regier et al. (2008a): ML analysis of 5 protein-coding nuclear genes; (B) Zwick et al. (2011): ML analysis of 25 protein-coding nuclear genes; (C) Kim et al. (2011): Bayesian analysis of mitochondrial genome data, nt123 + rRNA (above) and nt12 + rRNA (below); (D) Huan-Na et al. (2012): 13 protein coding mitochondrial genes; Bayesian posterior probability before the slash and ML bootstraps after the slash; (E) Meusemann et al. (2010): 129 genes, ML tree (above) and Bayesian tree (below); (F) Simon et al. (2012): ML analysis of 335 genes (above) and 102 genes (below); (G) Regier et al. (2013): ML analysis of 19 protein-coding nuclear genes.
F<sc>ig</sc>. 2.—
Fig. 2.—
Diagram showing general workflow from data collection to analysis.
F<sc>ig</sc>. 3.—
Fig. 3.—
ML tree estimated from 938 genes in RAxML with bootstrap values placed on each branch. AA, amino acid; iGTP, gene tree parsimony; ML, maximum likelihood; MP, parsimony. Codon positions: first = nt1, second = nt2, third = nt3.
F<sc>ig</sc>. 4.—
Fig. 4.—
The effect of nucleotide position and synonymous signal on phylogeny. (A) ML tree with RADICAL results from six matrices (nt123, nt12, degen1, nt3, degen1-nt3, and AA). Values are shown on each branch. Fixation/degradation points are shown to the left of the central bar, and the AUC score is shown to the right of the bar. Degradation is indicated with asterisks. (B–E) RADICAL curves for Nodes 1 through 4, with the average CFI of the 10 concatenation paths on the y axis and the number of concatenated genes on the x axis. (B) Node 1, Bombycidae + Saturniidae + Sphingidae. (C) Node 2, Saturniidae + Sphingidae. (D) Node 3, Darapsa myron + Hemaris diffinis. (E) Node 4, Actias luna + Antheraea assamensis.

Similar articles

Cited by

References

    1. Abascal F, Zardoya R, Telford MJ. TranslatorX: multiple alignment of nucleotide sequences guided by amino acid translations. Nucleic Acids Res. 2010;38(Suppl 2):W7–W13. - PMC - PubMed
    1. Benson DA, et al. GenBank. Nucleic Acids Res. 2013;41(1):36–42. - PMC - PubMed
    1. Betancur-R R, Li C, Munroe TA, Ballesteros JA, Ortí G. Addressing gene tree discordance and non-stationarity to resolve a multi-locus phylogeny of the flatfishes (Teleostei: Pleuronectiformes) Syst Biol. 2013;62(5):763–785. - PubMed
    1. Biomatters. Geneious v5.5.8. 2013 [cited 2013 Nov 8]. Available from: http://www.geneious.com.
    1. Buckley TR, Simon C, Chambers GK. Exploring among-site rate variation models in a maximum likelihood framework using empirical data: the effects of model assumptions on estimates of topology, branch lengths, and bootstrap support. Syst Biol. 2001;50:67–86. - PubMed