A model of the statistical power of comparative genome sequence analysis
- PMID: 15660152
- PMCID: PMC539325
- DOI: 10.1371/journal.pbio.0030010
A model of the statistical power of comparative genome sequence analysis
Abstract
Comparative genome sequence analysis is powerful, but sequencing genomes is expensive. It is desirable to be able to predict how many genomes are needed for comparative genomics, and at what evolutionary distances. Here I describe a simple mathematical model for the common problem of identifying conserved sequences. The model leads to some useful rules of thumb. For a given evolutionary distance, the number of comparative genomes needed for a constant level of statistical stringency in identifying conserved regions scales inversely with the size of the conserved feature to be detected. At short evolutionary distances, the number of comparative genomes required also scales inversely with distance. These scaling behaviors provide some intuition for future comparative genome sequencing needs, such as the proposed use of "phylogenetic shadowing" methods using closely related comparative genomes, and the feasibility of high-resolution detection of small conserved features.
Figures




Similar articles
-
A search tool for identification and analysis of conserved sequence patterns in Saccharomyces spp. orthologous promoter.In Silico Biol. 2004;4(4):411-5. In Silico Biol. 2004. PMID: 15506991
-
Discovery of regulatory elements in vertebrates through comparative genomics.Nat Biotechnol. 2005 Oct;23(10):1249-56. doi: 10.1038/nbt1140. Nat Biotechnol. 2005. PMID: 16211068
-
Performance and scalability of discriminative metrics for comparative gene identification in 12 Drosophila genomes.PLoS Comput Biol. 2008 Apr 18;4(4):e1000067. doi: 10.1371/journal.pcbi.1000067. PLoS Comput Biol. 2008. PMID: 18421375 Free PMC article.
-
Fast and accurate phylogenetic reconstruction from high-resolution whole-genome data and a novel robustness estimator.J Comput Biol. 2011 Sep;18(9):1131-9. doi: 10.1089/cmb.2011.0114. J Comput Biol. 2011. PMID: 21899420
-
Comparative population genomics: power and principles for the inference of functionality.Trends Genet. 2014 Apr;30(4):133-9. doi: 10.1016/j.tig.2014.02.002. Epub 2014 Mar 20. Trends Genet. 2014. PMID: 24656563 Free PMC article. Review.
Cited by
-
Comparative genomics and disorder prediction identify biologically relevant SH3 protein interactions.PLoS Comput Biol. 2005 Aug;1(3):e26. doi: 10.1371/journal.pcbi.0010026. Epub 2005 Aug 12. PLoS Comput Biol. 2005. PMID: 16110343 Free PMC article.
-
An initial strategy for the systematic identification of functional elements in the human genome by low-redundancy comparative sequencing.Proc Natl Acad Sci U S A. 2005 Mar 29;102(13):4795-800. doi: 10.1073/pnas.0409882102. Epub 2005 Mar 18. Proc Natl Acad Sci U S A. 2005. PMID: 15778292 Free PMC article.
-
BigFoot: Bayesian alignment and phylogenetic footprinting with MCMC.BMC Evol Biol. 2009 Aug 28;9:217. doi: 10.1186/1471-2148-9-217. BMC Evol Biol. 2009. PMID: 19715598 Free PMC article.
-
Analysis of sequence conservation at nucleotide resolution.PLoS Comput Biol. 2007 Dec;3(12):e254. doi: 10.1371/journal.pcbi.0030254. Epub 2007 Nov 14. PLoS Comput Biol. 2007. PMID: 18166073 Free PMC article.
-
Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data.Nat Rev Genet. 2011 Aug 18;12(9):628-40. doi: 10.1038/nrg3046. Nat Rev Genet. 2011. PMID: 21850043 Review.
References
-
- Hardison RC. Conserved noncoding sequences are reliable guides to regulatory elements. Trends Genet. 2000;16:369–372. - PubMed
-
- Sidow A. Sequence first. Ask questions later. Cell. 2002;111:13–16. - PubMed
-
- Thomas JW, Touchman JW, Blakesley RW, Bouffard GG, Beckstrom-Sternberg SM, et al. Comparative analyses of multi-species sequences from targeted genomic regions. Nature. 2003;424:788–793. - PubMed
-
- Cliften PF, Hillier LW, Fulton L, Graves T, Miner T, et al. Surveying Saccharomyces genomes to identify functional elements by comparative DNA sequence analysis. Genome Res. 2001;11:1175–1186. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases