. 2014 Aug 15;30(16):2255-62.

doi: 10.1093/bioinformatics/btu180. Epub 2014 Apr 21.

Multiscale DNA partitioning: statistical evidence for segments

Andreas Futschik¹, Thomas Hotz¹, Axel Munk², Hannes Sieling¹

Affiliations

¹ Department of Applied Statistics, JK University Linz, A-4040 Linz, Austria, Institute of Mathematics, Technische Universität Ilmenau, D-98693 Ilmenau, Germany, Institute for Mathematical Stochastics and Felix Bernstein Institute for Mathematical Statistics in Biosciences, Georgia Augusta University of Goettingen and Max Planck Institute for Biophysical Chemistry, D-37077 Goettingen, Germany.
² Department of Applied Statistics, JK University Linz, A-4040 Linz, Austria, Institute of Mathematics, Technische Universität Ilmenau, D-98693 Ilmenau, Germany, Institute for Mathematical Stochastics and Felix Bernstein Institute for Mathematical Statistics in Biosciences, Georgia Augusta University of Goettingen and Max Planck Institute for Biophysical Chemistry, D-37077 Goettingen, GermanyDepartment of Applied Statistics, JK University Linz, A-4040 Linz, Austria, Institute of Mathematics, Technische Universität Ilmenau, D-98693 Ilmenau, Germany, Institute for Mathematical Stochastics and Felix Bernstein Institute for Mathematical Statistics in Biosciences, Georgia Augusta University of Goettingen and Max Planck Institute for Biophysical Chemistry, D-37077 Goettingen, Germany.

PMID: 24753487
DOI: 10.1093/bioinformatics/btu180

Multiscale DNA partitioning: statistical evidence for segments

Andreas Futschik et al. Bioinformatics. 2014.

. 2014 Aug 15;30(16):2255-62.

doi: 10.1093/bioinformatics/btu180. Epub 2014 Apr 21.

Authors

Andreas Futschik¹, Thomas Hotz¹, Axel Munk², Hannes Sieling¹

Affiliations

¹ Department of Applied Statistics, JK University Linz, A-4040 Linz, Austria, Institute of Mathematics, Technische Universität Ilmenau, D-98693 Ilmenau, Germany, Institute for Mathematical Stochastics and Felix Bernstein Institute for Mathematical Statistics in Biosciences, Georgia Augusta University of Goettingen and Max Planck Institute for Biophysical Chemistry, D-37077 Goettingen, Germany.
² Department of Applied Statistics, JK University Linz, A-4040 Linz, Austria, Institute of Mathematics, Technische Universität Ilmenau, D-98693 Ilmenau, Germany, Institute for Mathematical Stochastics and Felix Bernstein Institute for Mathematical Statistics in Biosciences, Georgia Augusta University of Goettingen and Max Planck Institute for Biophysical Chemistry, D-37077 Goettingen, GermanyDepartment of Applied Statistics, JK University Linz, A-4040 Linz, Austria, Institute of Mathematics, Technische Universität Ilmenau, D-98693 Ilmenau, Germany, Institute for Mathematical Stochastics and Felix Bernstein Institute for Mathematical Statistics in Biosciences, Georgia Augusta University of Goettingen and Max Planck Institute for Biophysical Chemistry, D-37077 Goettingen, Germany.

PMID: 24753487
DOI: 10.1093/bioinformatics/btu180

Abstract

Motivation: DNA segmentation, i.e. the partitioning of DNA in compositionally homogeneous segments, is a basic task in bioinformatics. Different algorithms have been proposed for various partitioning criteria such as Guanine/Cytosine (GC) content, local ancestry in population genetics or copy number variation. A critical component of any such method is the choice of an appropriate number of segments. Some methods use model selection criteria and do not provide a suitable error control. Other methods that are based on simulating a statistic under a null model provide suitable error control only if the correct null model is chosen.

Results: Here, we focus on partitioning with respect to GC content and propose a new approach that provides statistical error control: as in statistical hypothesis testing, it guarantees with a user-specified probability [Formula: see text] that the number of identified segments does not exceed the number of actually present segments. The method is based on a statistical multiscale criterion, rendering this as a segmentation method that searches segments of any length (on all scales) simultaneously. It is also accurate in localizing segments: under benchmark scenarios, our approach leads to a segmentation that is more accurate than the approaches discussed in the comparative review of Elhaik et al. In our real data examples, we find segments that often correspond well to features taken from standard University of California at Santa Cruz (UCSC) genome annotation tracks.

Availability and implementation: Our method is implemented in function smuceR of the R-package stepR available at http://www.stochastik.math.uni-goettingen.de/smuce.

PubMed Disclaimer

Cited by

LDJump: Estimating variable recombination rates from population genetic data.
Hermann P, Heissl A, Tiemann-Boege I, Futschik A. Hermann P, et al. Mol Ecol Resour. 2019 May;19(3):623-638. doi: 10.1111/1755-0998.12994. Epub 2019 Apr 4. Mol Ecol Resour. 2019. PMID: 30666785 Free PMC article.
Drosophila simulans: A Species with Improved Resolution in Evolve and Resequence Studies.
Barghi N, Tobler R, Nolte V, Schlötterer C. Barghi N, et al. G3 (Bethesda). 2017 Jul 5;7(7):2337-2343. doi: 10.1534/g3.117.043349. G3 (Bethesda). 2017. PMID: 28546383 Free PMC article.
Whole exome sequencing of wild-derived inbred strains of mice improves power to link phenotype and genotype.
Chang PL, Kopania E, Keeble S, Sarver BAJ, Larson E, Orth A, Belkhir K, Boursot P, Bonhomme F, Good JM, Dean MD. Chang PL, et al. Mamm Genome. 2017 Oct;28(9-10):416-425. doi: 10.1007/s00335-017-9704-9. Epub 2017 Aug 17. Mamm Genome. 2017. PMID: 28819774 Free PMC article.
Estimating the Effective Population Size from Temporal Allele Frequency Changes in Experimental Evolution.
Jónás Á, Taus T, Kosiol C, Schlötterer C, Futschik A. Jónás Á, et al. Genetics. 2016 Oct;204(2):723-735. doi: 10.1534/genetics.116.191197. Epub 2016 Aug 19. Genetics. 2016. PMID: 27542959 Free PMC article.
On optimal multiple changepoint algorithms for large data.
Maidstone R, Hocking T, Rigaill G, Fearnhead P. Maidstone R, et al. Stat Comput. 2017;27(2):519-533. doi: 10.1007/s11222-016-9636-3. Epub 2016 Feb 15. Stat Comput. 2017. PMID: 32355427 Free PMC article.

See all "Cited by" articles

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
- Ovid Technologies, Inc.
- Silverchair Information Systems
Other Literature Sources
- scite Smart Citations
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Multiscale DNA partitioning: statistical evidence for segments

Affiliations

Multiscale DNA partitioning: statistical evidence for segments

Authors

Affiliations

Abstract

Similar articles

Cited by

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous

Abstract

Similar articles

Cited by

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous