Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Sep 7:16:335-341.
doi: 10.1016/j.csbj.2018.09.001. eCollection 2018.

SEG - A Software Program for Finding Somatic Copy Number Alterations in Whole Genome Sequencing Data of Cancer

Affiliations

SEG - A Software Program for Finding Somatic Copy Number Alterations in Whole Genome Sequencing Data of Cancer

Mucheng Zhang et al. Comput Struct Biotechnol J. .

Abstract

As next-generation sequencing technology advances and the cost decreases, whole genome sequencing (WGS) has become the preferred platform for the identification of somatic copy number alteration (CNA) events in cancer genomes. To more effectively decipher these massive sequencing data, we developed a software program named SEG, shortened from the word "segment". SEG utilizes mapped read or fragment density for CNA discovery. To reduce CNA artifacts arisen from sequencing and mapping biases, SEG first normalizes the data by taking the log2-ratio of each tumor density against its matching normal density. SEG then uses dynamic programming to find change-points among a contiguous log2-ratio data series along a chromosome, dividing the chromosome into different segments. SEG finally identifies those segments having CNA. Our analyses with both simulated and real sequencing data indicate that SEG finds more small CNAs than other published software tools.

Keywords: Cancer; SEG; Somatic Copy Number Alteration; Whole Genome Sequencing.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
The algorithm of SEG. SEG will: 1)normalize the data and exclude the log2-ratio outliers (smooth data); 2)identify change-points; and 3)find CNAs (label segments).For change-point detection, SEG first depends upon the user's input to assign initial change-points, and then loops through the SSE (sum of squared error) to remove insignificant change-points using dynamic programming (see text).The program is implemented in C and can be downloaded from GitHub at https://github.com/ZhaoS-Lab/SEG.
Fig. 2
Fig. 2
CNAs identified by SEG, BICseq, FREEC and CBS in 10 simulated samples of chromosome 22.A.Amplifications and deletions of ground truth, and those identified by SEG or other software tools drew as described18 for one simulated sample. B. Heatmaps showing the overall sensitivity and specificity of CNA detection in each of the 10 simulated samples by SEG or other software tools. C. Heatmaps showing the overall sensitivity of CNA detection based on the size by SEG or other software tools. D. Heatmaps showing the overall sensitivity of CNA detection for each category indicated by SEG or other software tools.
Fig. 3
Fig. 3
Data normalization in the three canine mammary cancer genomes. A.The distribution of average mapped fragment density, di, of 100 bp tilting window of the tumor and normal genome of the cancer cases with ID indicated. B. The distribution of the normalized density against its genome wide average by. C. The distribution of the final normalized density of the tumor against the matching normal data by (equation).
Fig. 4
Fig. 4
Large CNAs identified with WGS (A)and aCGH (B)by SEG.Each line represents a dog chromosome with its chromosome number indicated on the left.Red (amplifications) and blue (deletions) vertical lines shown above the chromosomes are drew as previously described4.Only CNAs of >8.5 kb were plotted, as 8.5 kb is the minimal size of CNAs found by aCGH.

Similar articles

Cited by

References

    1. Stephens P.J., DJ McBride, Lin M.L. Complex landscapes of somatic rearrangement in human breast cancer genomes. Nature. 2009;462(7276):1005–1010. - PMC - PubMed
    1. Tang J., Le S., Sun L. Copy number abnormalities in sporadic canine colorectal cancers. Genome Res. 2010;20(3):341–350. - PMC - PubMed
    1. Tang J., Li Y., Lyon K. Cancer driver-passenger distinction via sporadic human and dog cancer comparison: a proof-of-principle study with colorectal cancer. Oncogene. 2014;33(7):814–822. - PMC - PubMed
    1. Liu D., Xiong H., Ellis A.E. Molecular homology and difference between spontaneous canine mammary cancer and human breast cancer. Cancer Res. 2014;74(18):5045–5056. - PMC - PubMed
    1. Zack T.I., Schumacher S.E., Carter S.L. Pan-cancer patterns of somatic copy number alteration. Nat Genet. 2013;45(10):1134–1140. - PMC - PubMed

LinkOut - more resources