Reconstructing ancestral genomic sequences by co-evolution: formal definitions, computational issues, and biological examples
- PMID: 20874411
- DOI: 10.1089/cmb.2010.0112
Reconstructing ancestral genomic sequences by co-evolution: formal definitions, computational issues, and biological examples
Abstract
The inference of ancestral genomes is a fundamental problem in molecular evolution. Due to the statistical nature of this problem, the most likely or the most parsimonious ancestral genomes usually include considerable error rates. In general, these errors cannot be abolished by utilizing more exhaustive computational approaches, by using longer genomic sequences, or by analyzing more taxa. In recent studies, we showed that co-evolution is an important force that can be used for significantly improving the inference of ancestral genome content. In this work we formally define a computational problem for the inference of ancestral genome content by co-evolution. We show that this problem is NP-hard and hard to approximate and present both a Fixed Parameter Tractable (FPT) algorithm, and heuristic approximation algorithms for solving it. The running time of these algorithms on simulated inputs with hundreds of protein families and hundreds of co-evolutionary relations was fast (up to four minutes) and it achieved an approximation ratio of <1.3. We use our approach to study the ancestral genome content of the Fungi. To this end, we implement our approach on a dataset of 33, 931 protein families and 20, 317 co-evolutionary relations. Our algorithm added and removed hundreds of proteins from the ancestral genomes inferred by maximum likelihood (ML) or maximum parsimony (MP) while slightly affecting the likelihood/parsimony score of the results. A biological analysis revealed various pieces of evidence that support the biological plausibility of the new solutions. In addition, we showed that our approach reconstructs missing values at the leaves of the Fungi evolutionary tree better than ML or MP.
Similar articles
-
Reconstruction of ancestral genomic sequences using likelihood.J Comput Biol. 2007 Mar;14(2):216-37. doi: 10.1089/cmb.2006.0101. J Comput Biol. 2007. PMID: 17456016
-
Ancestral sequence alignment under optimal conditions.BMC Bioinformatics. 2005 Nov 17;6:273. doi: 10.1186/1471-2105-6-273. BMC Bioinformatics. 2005. PMID: 16293191 Free PMC article.
-
Approximate maximum parsimony and ancestral maximum likelihood.IEEE/ACM Trans Comput Biol Bioinform. 2010 Jan-Mar;7(1):183-7. doi: 10.1109/TCBB.2008.13. IEEE/ACM Trans Comput Biol Bioinform. 2010. PMID: 20150680
-
Ancestral animal genomes reconstruction.Curr Opin Immunol. 2007 Oct;19(5):542-6. doi: 10.1016/j.coi.2007.06.009. Epub 2007 Aug 15. Curr Opin Immunol. 2007. PMID: 17702562 Review.
-
Evolution at the nucleotide level: the problem of multiple whole-genome alignment.Hum Mol Genet. 2006 Apr 15;15 Spec No 1:R51-6. doi: 10.1093/hmg/ddl056. Hum Mol Genet. 2006. PMID: 16651369 Review.
Cited by
-
Efficient algorithms for reconstructing gene content by co-evolution.BMC Bioinformatics. 2011 Oct 5;12 Suppl 9(Suppl 9):S12. doi: 10.1186/1471-2105-12-S9-S12. BMC Bioinformatics. 2011. PMID: 22151715 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Miscellaneous