Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jan 3;20(1):1.
doi: 10.1186/s12859-018-2565-8.

DBS: a fast and informative segmentation algorithm for DNA copy number analysis

Affiliations

DBS: a fast and informative segmentation algorithm for DNA copy number analysis

Jun Ruan et al. BMC Bioinformatics. .

Abstract

Background: Genome-wide DNA copy number changes are the hallmark events in the initiation and progression of cancers. Quantitative analysis of somatic copy number alterations (CNAs) has broad applications in cancer research. With the increasing capacity of high-throughput sequencing technologies, fast and efficient segmentation algorithms are required when characterizing high density CNAs data.

Results: A fast and informative segmentation algorithm, DBS (Deviation Binary Segmentation), is developed and discussed. The DBS method is based on the least absolute error principles and is inspired by the segmentation method rooted in the circular binary segmentation procedure. DBS uses point-by-point model calculation to ensure the accuracy of segmentation and combines a binary search algorithm with heuristics derived from the Central Limit Theorem. The DBS algorithm is very efficient requiring a computational complexity of O(n*log n), and is faster than its predecessors. Moreover, DBS measures the change-point amplitude of mean values of two adjacent segments at a breakpoint, where the significant degree of change-point amplitude is determined by the weighted average deviation at breakpoints. Accordingly, using the constructed binary tree of significant degree, DBS informs whether the results of segmentation are over- or under-segmented.

Conclusion: DBS is implemented in a platform-independent and open-source Java application (ToolSeg), including a graphical user interface and simulation data generation, as well as various segmentation methods in the native Java language.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
Segmentation process and binary tree of p in DBS. a an assumed segmentation process with two breakpoints. Row [0] is the initial sequence to be split. Row [1] shows the first breakpoint would be found at loci b1, and Row [2] is similar. b shows the corresponding binary tree of p generated by (a). Here the identification of every node (Node ID) also is the Segment ID
Fig. 2
Fig. 2
Segmentation process with simulation data in DBS. a shows the segmentation process by splitting multiple times. Notably, DBS uses a recursive algorithm. After Node 1, 3, 4, 5, and 7 were found one by one, Node 11, etc. at right part were discovered. The red lines over gray data points is the segmentation curves. The curves are the results of segmentation, and indicate the ranges and average of each sub-segment. b shows the corresponding binary tree of p generated by the left panel (a). The red dotted line represents the position of the estimated standard deviation σ^, and the red solid line represents the position of the threshold σ^ of degree of significant p of breakpoints
Fig. 3
Fig. 3
Segmentation process with an actual data sample in DBS (using half copy numbers). a the segmentation process in the binary tree of p. b plots the copy number of an actual sample, and shows the position and p of the 12 true breakpoints, which correspond to these yellow nodes in Panel (a). In (b), the observed copy number signals are the ratios of the measured intensity of tumor-normal matched sample
Fig. 4
Fig. 4
ROC-curves of five segmentation methods. The curves show the sensitivity and specificity of accuracy for a sequence of thresholds as calculated by comparing aberration calls to the classifications made in a MLPA-analysis on the test dataset. (a) and (b) show that the classification accuracy is not affected much for a wide range of λ and γ. Here γ is equal to 0.02 in (a), and λ is equal to 0.02 in (b). c shows the effect of different combinations of window sizes. Curve W1 is the result using window sizes generated by the arithmetic progression with common difference of 1. Curve W2, W4 and W8 correspond to window sizes of the geometric sequence with common ratio of 2, 4 and 8 respectively. λ and γ is default value (0.02). d shows calls based on the segmentations found by DNAcopy v1.52.0 (CBS), copynumber v1.18 (PCF), the method in BACOM and DBS with raw data
Fig. 5
Fig. 5
Computational complexity of time in the four algorithms. The solid lines with different colors represent the conventional linear regression models, which correspond to the data points with the same colors. The x-axis represents the logarithmic length of test samples (sequences), and the y-axis represents the logarithmic computation time

References

    1. Pollack JR, Sorlie T, Perou CM, Rees CA, Jeffrey SS, Lonning PE, Tibshirani R, Botstein D, Borresen-Dale AL, Brown PO. Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors. Proc Natl Acad Sci U S A. 2002;99(20):12963–12968. doi: 10.1073/pnas.162471999. - DOI - PMC - PubMed
    1. Beroukhim R, Mermel CH, Porter D, Wei G, Raychaudhuri S, Donovan J, Barretina J, Boehm JS, Dobson J, Urashima M. The landscape of somatic copy-number alteration across human cancers. Nature. 2010;463(7283):899–905. doi: 10.1038/nature08822. - DOI - PMC - PubMed
    1. Olshen AB, Venkatraman ES, Lucito R, Wigler M. Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics. 2004;5(4):557–572. doi: 10.1093/biostatistics/kxh008. - DOI - PubMed
    1. Venkatraman ES, Olshen AB. A faster circular binary segmentation algorithm for the analysis of array CGH data. Bioinformatics. 2007;23(6):657–663. doi: 10.1093/bioinformatics/btl646. - DOI - PubMed
    1. Fridlyand J, Snijders AM, Pinkel D, Albertson DG, Jain AN. Hidden Markov models approach to the analysis of array CGH data. J Multivar Anal. 2004;90(1):132–153. doi: 10.1016/j.jmva.2004.02.008. - DOI

LinkOut - more resources