Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 Mar 11:6:53.
doi: 10.1186/1471-2105-6-53.

Evolutionary sequence analysis of complete eukaryote genomes

Affiliations

Evolutionary sequence analysis of complete eukaryote genomes

Jaime E Blair et al. BMC Bioinformatics. .

Abstract

Background: Gene duplication and gene loss during the evolution of eukaryotes have hindered attempts to estimate phylogenies and divergence times of species. Although current methods that identify clusters of orthologous genes in complete genomes have helped to investigate gene function and gene content, they have not been optimized for evolutionary sequence analyses requiring strict orthology and complete gene matrices. Here we adopt a relatively simple and fast genome comparison approach designed to assemble orthologs for evolutionary analysis. Our approach identifies single-copy genes representing only species divergences (panorthologs) in order to minimize potential errors caused by gene duplication. We apply this approach to complete sets of proteins from published eukaryote genomes specifically for phylogeny and time estimation.

Results: Despite the conservative criterion used, 753 panorthologs (proteins) were identified for evolutionary analysis with four genomes, resulting in a single alignment of 287,000 amino acids. With this data set, we estimate that the divergence between deuterostomes and arthropods took place in the Precambrian, approximately 400 million years before the first appearance of animals in the fossil record. Additional analyses were performed with seven, 12, and 15 eukaryote genomes resulting in similar divergence time estimates and phylogenies.

Conclusion: Our results with available eukaryote genomes agree with previous results using conventional methods of sequence data assembly from genomes. They show that large sequence data sets can be generated relatively quickly and efficiently for evolutionary analyses of complete genomes.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Flowchart of multigenome intersection approach (MIA). 1) Complete genomes are reciprocally compared against themselves and all other genomes with BLAST. 2) Pairwise ortholog clusters are identified using similarity scores and imported into a local database. 3) The intersection between genomes is determined by iteratively comparing sequence identification tags and retaining those clusters showing panorthology. 4) Additional genomes are added and checked as in the previous step. 5) Sequence data files are generated for evolutionary analysis.
Figure 2
Figure 2
Neighbor-joining tree of nine metazoan genomes, 285 panorthologs (97,581 amino acid positions, alpha = 1.28). All nodes are supported significantly (>95%) in bootstrap analyses of neighbor-joining and maximum likelihood. The arrow indicates an alternative root [6, 18].
Figure 3
Figure 3
Neighbor-joining trees of complete eukaryotic genome sequence analyses. (A) The intersection of fifteen eukaryotic genomes, 10 panorthologs (5094 amino acid positions, alpha = 1.01). (B) The intersection of genomes from twelve multicellular eukaryotes, 63 panorthologs (23,571 amino acid positions, alpha = 1.15). All nodes are supported significantly (>95%) in bootstrap analyses of neighbor-joining and maximum likelihood, with the exception of node indicated by an asterisk (94% with maximum likelihood) in (A).
Figure 4
Figure 4
Neighbor-joining trees of genomes used to address deuterostome-arthropod divergence time. (A) The intersection of seven eukaryotic genomes, 380 panorthologs (132,190 amino acid positions, alpha = 1.38). (B) The intersection of four eukaryotic genomes, 753 panorthologs (287,000 amino acid positions, alpha = 1.46). All nodes are supported significantly (>95%) in bootstrap analyses of neighbor-joining and maximum likelihood.

References

    1. Wolf YI, Rogozin IB, Grishin NV, Tatusov RL, Koonin EV. Genome trees constructed using five different approaches suggest new major bacterial clades. BMC Evolutionary Biology. 2001;1:8. doi: 10.1186/1471-2148-1-8. - DOI - PMC - PubMed
    1. Raymond J, Zhaxybayeva O, Gogarten JP, Gerdes SY, Blankenship RE. Whole-genome analysis of photosynthetic prokaryotes. Science. 2002;298:1616–1620. doi: 10.1126/science.1075558. - DOI - PubMed
    1. Brochier C, Forterre P, Gribaldo S. Archaeal phylogeny based on proteins of the transcription and translation machineries: tackling the Methanopyrus kandleri paradox. Genome Biology. 2004;5:R17. doi: 10.1186/gb-2004-5-3-r17. - DOI - PMC - PubMed
    1. Korbel JO, Snel B, Huynen MA, Bork P. SHOT: a web server for the construction of genome phylogenies. Trends in Genetics. 2002;18:158–162. doi: 10.1016/S0168-9525(01)02597-5. - DOI - PubMed
    1. House CH, Runnegar B, Fitz-Gibbon ST. Geobiological analysis using whole genome-based tree building applied to the Bacteria, Archaea, and Eukarya. Geobiology. 2003;1:15–26. doi: 10.1046/j.1472-4669.2003.00004.x. - DOI

Publication types

MeSH terms

LinkOut - more resources