Co-phylog: an assembly-free phylogenomic approach for closely related organisms

Huiguang Yi¹, Li Jin

Affiliations

Affiliation

¹ State Key Laboratory of Genetic Engineering and Ministry of Education Key Laboratory of Contemporary Anthropology, School of Life Sciences, Fudan University, Shanghai 200433, China. yhg926@gmail.com

PMID: 23335788
PMCID: PMC3627563
DOI: 10.1093/nar/gkt003

Co-phylog: an assembly-free phylogenomic approach for closely related organisms

Huiguang Yi et al. Nucleic Acids Res. 2013 Apr.

. 2013 Apr;41(7):e75.

doi: 10.1093/nar/gkt003. Epub 2013 Jan 18.

Authors

Huiguang Yi¹, Li Jin

Affiliation

¹ State Key Laboratory of Genetic Engineering and Ministry of Education Key Laboratory of Contemporary Anthropology, School of Life Sciences, Fudan University, Shanghai 200433, China. yhg926@gmail.com

PMID: 23335788
PMCID: PMC3627563
DOI: 10.1093/nar/gkt003

Abstract

With the advent of high-throughput sequencing technologies, the rapid generation and accumulation of large amounts of sequencing data pose an insurmountable demand for efficient algorithms for constructing whole-genome phylogenies. The existing phylogenomic methods all use assembled sequences, which are often not available owing to the difficulty of assembling short-reads; this obstructs phylogenetic investigations on species without a reference genome. In this report, we present co-phylog, an assembly-free phylogenomic approach that creates a 'micro-alignment' at each 'object' in the sequence using the 'context' of the object and calculates pairwise distances before reconstructing the phylogenetic tree based on those distances. We explored the parameters' usages and the optimal working range of co-phylog, assessed co-phylog using the simulated next-generation sequencing (NGS) data and the real NGS raw data. We also compared co-phylog method with traditional alignment and alignment-free methods and illustrated the advantages and limitations of co-phylog method. In conclusion, we demonstrated that co-phylog is efficient algorithm and that it delivers high resolution and accurate phylogenies using whole-genome unassembled sequencing data, especially in the case of closely related organisms, thereby significantly alleviating the computational burden in the genomic era.

PubMed Disclaimer

Figures

**Figure 1.**
The algorithm overview. (a) Some examples of structure S. (b) The k-tuple sets *H_k,G1* and *H_k,G2* that generated from genome G₁ and genome G₂, respectively, given a structure S = C_2,2O₁. (c) *C-gram–O-gram* pairs generated from the corresponding k-tuple sets. (d) Context–object pairs generated from the corresponding *C-gram–O-gram* pairs. (e) Shared Context and their corresponding objects in G₁ and G₂. (f) The computing of context–object distance between G₁ and G₂.

**Figure 2.**
Comparisons of the alignment-based tree and the *co-phylog* trees constructed with different structures, on the *Brucella* 13 genomes. All the trees share the same organisms list. The *Ochrobactrum anthropi* genome is adopted as the out-group taxon.

**Figure 3.**
(a) The benchmark tree constructed based on multiple genomes alignment and the trees constructed by the three methods, *co-phylog* (S = C_9,9O₁), *CVtree* and Kr, on the *Escherichia/Shigella* 26 genomes. The number near the node represents the bootstrap value (see Doc. S1 for details). And (b) the symmetric differences of the benchmark tree against the trees constructed by the three methods, *co-phylog*, *CVtree* and *Kr.* (c) Correlation analyses between the p-distance and each of the three distances, co-distance, *CVtree*-distance and Kr-distance. These four types of distances are generated from the pairwise comparisons of the *Escherichia coli/Shigella* 26 genomes, using multiple genomes alignment, *co-phylog*, *CVtree* and Kr, respectively.

**Figure 4.**
Comparison between the 16S rDNA tree and the *co-phylog* tree, constructed on the *Enterobacteriaceae* 63 genomes. The number near the node represents the bootstrap value (see Supplementary Data for details).

**Figure 5.**
The changing of the co-distances and the log number of the common context counts computed between two genome evolved *in silico*, with gradually increased evolutionary divergence (substitutions per codon), using two structures S = C_9,9O₁ and C_12,12O₁.

**Figure 6.**
Comparison between the *co-phylog* tree constructed using assembled genomes of the *E. coli* 29 organisms and the *co-phylog* tree constructed using their corresponding NGS raw data. The *Escherichia fergusonii* genome is adopted as the out-group taxon.

See this image and copyright information in PMC

References

1. Wiens JJ. Missing data, incomplete taxa, and phylogenetic accuracy. Syst. Biol. 2003;52:528–538. - PubMed
1. Snel B, Bork P, Huynen MA. Genome phylogeny based on gene content. Nat. Genet. 1999;21:108–110. - PubMed
1. Blanchette M, Kunisawa T, Sankoff D. Gene order breakpoint evidence in animal mitochondrial phylogeny. J. Mol. Evol. 1999;49:193–203. - PubMed
1. Ulitsky I, Burstein D, Tuller T, Chor B. The average common substring approach to phylogenomic reconstruction. J. Comput. Biol. 2006;13:336–350. - PubMed
1. Qi J, Wang B, Hao BI. Whole proteome prokaryote phylogeny without sequence alignment: a K-string composition approach. J. Mol. Evol. 2004;58:1–11. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- H1 Connect - Access expert opinions and insights on biomedical research.
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Co-phylog: an assembly-free phylogenomic approach for closely related organisms

Affiliation

Co-phylog: an assembly-free phylogenomic approach for closely related organisms

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources