Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Oct 27;2(10):e173.
doi: 10.1371/journal.pgen.0020173. Epub 2006 Aug 28.

Widespread discordance of gene trees with species tree in Drosophila: evidence for incomplete lineage sorting

Affiliations

Widespread discordance of gene trees with species tree in Drosophila: evidence for incomplete lineage sorting

Daniel A Pollard et al. PLoS Genet. .

Abstract

The phylogenetic relationship of the now fully sequenced species Drosophila erecta and D. yakuba with respect to the D. melanogaster species complex has been a subject of controversy. All three possible groupings of the species have been reported in the past, though recent multi-gene studies suggest that D. erecta and D. yakuba are sister species. Using the whole genomes of each of these species as well as the four other fully sequenced species in the subgenus Sophophora, we set out to investigate the placement of D. erecta and D. yakuba in the D. melanogaster species group and to understand the cause of the past incongruence. Though we find that the phylogeny grouping D. erecta and D. yakuba together is the best supported, we also find widespread incongruence in nucleotide and amino acid substitutions, insertions and deletions, and gene trees. The time inferred to span the two key speciation events is short enough that under the coalescent model, the incongruence could be the result of incomplete lineage sorting. Consistent with the lineage-sorting hypothesis, substitutions supporting the same tree were spatially clustered. Support for the different trees was found to be linked to recombination such that adjacent genes support the same tree most often in regions of low recombination and substitutions supporting the same tree are most enriched roughly on the same scale as linkage disequilibrium, also consistent with lineage sorting. The incongruence was found to be statistically significant and robust to model and species choice. No systematic biases were found. We conclude that phylogenetic incongruence in the D. melanogaster species complex is the result, at least in part, of incomplete lineage sorting. Incomplete lineage sorting will likely cause phylogenetic incongruence in many comparative genomics datasets. Methods to infer the correct species tree, the history of every base in the genome, and comparative methods that control for and/or utilize this information will be valuable advancements for the field of comparative genomics.

PubMed Disclaimer

Conflict of interest statement

Competing interests. The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Phylogenies
The three possible phylogenies for Dmel, Dere, and Dyak, with Dana as an outgroup.
Figure 2
Figure 2. Widespread Incongruence of Substitutions, Indels, and Gene Trees
(A) The proportion of informative nucleotide substitutions in 9,405 genes supporting each of the three trees. Tree 1 (red) is supported by 170,002 (44.7%) nucleotide changes; tree 2 (green), 112,278 (29.5%) nucleotide changes; and tree 3 (purple), 98,117 (25.8%) nucleotide changes. (B) The proportion of informative amino acid substitutions in 9,405 genes supporting each of the three trees. Tree 1 (red) is supported by 28,628 (49.3%) amino acid changes; tree 2 (green), 15,182 (26.2%) amino acid changes; and tree 3 (purple), 14,203 (24.5%) amino acid changes. (C) The proportion of informative insertions or deletions (indels) in 9,405 genes supporting each of the three genes. Indels were filtered, requiring five flanking amino acids of perfect identity and no repetitive sequence. Tree 1 (red) is supported by 2 deletions and 6 insertions (66.7%); tree 2 (green), 1 deletion and 1 insertion (16.7%); and tree 3 (purple), 2 insertions (16.7%). Similar proportions but much larger counts are found when the indels are not filtered. (D) The proportion of 9,315 genes with ML support for each of the three trees. Tree 1 (red) has ML support for 5,381 (57.8%); tree 2 (green), 2,188 (23.5%); and tree 3 (purple), 1,746 (18.7%).
Figure 3
Figure 3. Incomplete Lineage Sorting
The history of a gene (colored lines) is drawn in the context of a species tree (gray bars). New lineages arising from new polymorphisms in the gene are drawn in different colors. In this case, the two alleles in the population prior to the split of Dmel are maintained through to the split of Dere and Dyak, leading to incomplete lineage sorting and an incongruent genealogy (tree 2). The greater the diversity in the ancestral population and the shorter the time between speciation events, the more likely nonspecies genealogies are.
Figure 4
Figure 4. Median Synonymous Trees
Median synonymous branch length trees derived from the genes supporting each of the three trees are drawn to the same scale. The branch spanning the two speciation events is quite short for all trees.
Figure 5
Figure 5. Coalescence Probabilities for Each Tree
Using the formula p(congruence) = 1 − 2/3exp(−t), where t = generations / 2Ne, the probability of the species tree (black) and the probability of one of the two alternate trees (gray) was plotted as a function of t.
Figure 6
Figure 6. Clustering of Informative Sites
The enrichment of informative nucleotide (A) and amino acid (B) substitutions near other substitutions that support the same phylogeny was found for all three trees and is on a scale roughly similar to estimates of linkage disequilibrium. At each informative site in the genome, the counts of informative sites supporting each of the three trees in 1-kb windows extending 30 kb up- and downstream were measured. For each type of informative site, the enrichment of the same type of informative site in each 1-kb window was calculated using the observed counts and the expected number of sites based on their genome-wide frequency. Enrichment is log10(observed / expected).
Figure 7
Figure 7. Significance of Incongruence
An excess of incongruence above what is expected by chance was observed for the set of all genes (A) as well as the set of genes that consistently supported the same tree across models and species combinations (B). Genes were binned by bootstrap value, and the proportion of genes supporting tree 1 (red line), tree 2 (green line), and tree 3 (purple line) were plotted. The expected congruence based on the bootstrap value in each bin (black solid line) and the 95% confidence interval based on a X 2 distribution (black dash line) demonstrate the excess incongruence.
Figure 8
Figure 8. Sequence and Evolutionary Gene Properties
Sequence and evolutionary properties of the genes are unable to explain the incongruence. Distributions are calculated using results from the original ML analysis using the F3×4 model and the Dmel, Dere, Dyak, and Dana species combination. The distributions of informative synonymous divergences in genes supporting each tree reveal a bias toward lower values for the incongruent genes (A). Nearly all genes with little or no informative synonymous divergence, however, are classified as inconsistent (B). Therefore, consistent genes have very similar distributions of ISD across trees (C). TSD is distributed similarly across trees, suggesting homoplasy due to increased mutation rates is not causing the incongruence (D). Gene length is slightly higher in tree 1 genes but overall is very similar across trees (E). Third codon position GC content is slightly biased toward lower values for Dmel and Dana and higher values for Dere and Dyak, creating a conservative bias for the incongruence (F).

Similar articles

Cited by

References

    1. Russo CA, Takezaki N, Nei M. Molecular phylogeny and divergence times of drosophilid species. Mol Biol Evol. 1995;12:391–404. - PubMed
    1. Powell JR. Progress and prospects in evolutionary biology: The Drosophila model. New York: Oxford University Press; 1997. 562
    1. Lewis RL, Beckenbach AT, Mooers AO. The phylogeny of the subgroups within the melanogaster species group: Likelihood tests on COI and COII sequences and a Bayesian estimate of phylogeny. Mol Phylogenet Evol. 2005;37:15–24. - PubMed
    1. O'Grady PM, Kidwell MG. Phylogeny of the subgenus sophophora (Diptera: drosophilidae) based on combined analysis of nuclear and mitochondrial sequences. Mol Phylogenet Evol. 2002;22:442–453. - PubMed
    1. Remsen J, O'Grady P. Phylogeny of Drosophilinae (Diptera: Drosophilidae), with comments on combined analysis and character support. Mol Phylogenet Evol. 2002;24:249–264. - PubMed

Publication types