. 2019 Dec 17;20(Suppl 20):641.

doi: 10.1186/s12859-019-3208-4.

Recovering rearranged cancer chromosomes from karyotype graphs

Sergey Aganezov¹, Ilya Zban², Vitaly Aksenov^{2

3}, Nikita Alexeev², Michael C Schatz^{4

5}

Affiliations

¹ Department of Computer Science, Johns Hopkins University, 3400 N. Charles st., Baltimore, 21210, MD, USA. aganezov@cs.jhu.edu.
² Computer Technologies Laboratory "Computer technology", ITMO University, Kronverkskiy Prospekt, 49, Saint Petersburg, 197101, Russia.
³ IST Austria, Am Campus 1, Klosterneuburg, 3400, Austria.
⁴ Department of Computer Science, Johns Hopkins University, 3400 N. Charles st., Baltimore, 21210, MD, USA.
⁵ Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring Harbor, 11724, NY, USA.

PMID: 31842730
PMCID: PMC6915857
DOI: 10.1186/s12859-019-3208-4

Recovering rearranged cancer chromosomes from karyotype graphs

Sergey Aganezov et al. BMC Bioinformatics. 2019.

. 2019 Dec 17;20(Suppl 20):641.

doi: 10.1186/s12859-019-3208-4.

Authors

Sergey Aganezov¹, Ilya Zban², Vitaly Aksenov^{2

3}, Nikita Alexeev², Michael C Schatz^{4

5}

Affiliations

¹ Department of Computer Science, Johns Hopkins University, 3400 N. Charles st., Baltimore, 21210, MD, USA. aganezov@cs.jhu.edu.
² Computer Technologies Laboratory "Computer technology", ITMO University, Kronverkskiy Prospekt, 49, Saint Petersburg, 197101, Russia.
³ IST Austria, Am Campus 1, Klosterneuburg, 3400, Austria.
⁴ Department of Computer Science, Johns Hopkins University, 3400 N. Charles st., Baltimore, 21210, MD, USA.
⁵ Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring Harbor, 11724, NY, USA.

PMID: 31842730
PMCID: PMC6915857
DOI: 10.1186/s12859-019-3208-4

Abstract

Background: Many cancer genomes are extensively rearranged with highly aberrant chromosomal karyotypes. Structural and copy number variations in cancer genomes can be determined via abnormal mapping of sequenced reads to the reference genome. Recently it became possible to reconcile both of these types of large-scale variations into a karyotype graph representation of the rearranged cancer genomes. Such a representation, however, does not directly describe the linear and/or circular structure of the underlying rearranged cancer chromosomes, thus limiting possible analysis of cancer genomes somatic evolutionary process as well as functional genomic changes brought by the large-scale genome rearrangements.

Results: Here we address the aforementioned limitation by introducing a novel methodological framework for recovering rearranged cancer chromosomes from karyotype graphs. For a cancer karyotype graph we formulate an Eulerian Decomposition Problem (EDP) of finding a collection of linear and/or circular rearranged cancer chromosomes that are determined by the graph. We derive and prove computational complexities for several variations of the EDP. We then demonstrate that Eulerian decomposition of the cancer karyotype graphs is not always unique and present the Consistent Contig Covering Problem (CCCP) of recovering unambiguous cancer contigs from the cancer karyotype graph, and describe a novel algorithm CCR capable of solving CCCP in polynomial time. We apply CCR on a prostate cancer dataset and demonstrate that it is capable of consistently recovering large cancer contigs even when underlying cancer genomes are highly rearranged.

Conclusions: CCR can recover rearranged cancer contigs from karyotype graphs thereby addressing existing limitation in inferring chromosomal structures of rearranged cancer genomes and advancing our understanding of both patient/cancer-specific as well as the overall genetic instability in cancer.

Keywords: Cancer genomics; Computational biology; Contigs; Genome rearrangements; Graph decomposition; Structural variations.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

**Fig. 1**
Rearranged genomes and Interval Adjacency Graphs. a Reference and derived genomes R and Q both depicted as sequences of oriented segments and a weighted Interval Adjacency Graph determined by a derived genome Q. In the IAG segment edges are shown as solid, with colors corresponding to distinct chromosomes, and adjacency edges are shown as dashed, with reference adjacencies shown in black, and novel adjacencies depicted in red. Edges with copy number 0 are shown as faded, copy numbers of 1 are omitted for clarity. b Rearranged linear chromosomes in the mutated genome Q depicted as both sequences of oriented segments and as segment/adjacency edge-alternating paths in the IAG’s connected components

**Fig. 2**
Eulerian Decompositions of Interval Adjacency Graphs. Distinct Eulerian decompositions of different cardinalities of IAG connected components where vertices have a non-zero copy number excess (left) and with all vertices being copy number balanced (right)

**Fig. 3**
Reduction of K₃ covering problem to the instance of max-EDP. Transformation of the regular undirected graph G into IAG G^′ and back with K₃ decomposition in G corresponding to segment-adjacency edge-alternating cycles in G^′

**Fig. 4**
Eulerian decompositions and contigs covering inference for Interval Adjacency Graphs. a Two distinct contig coverings corresponding to minimal Eulerian decompositions of the same IAG. b Transformation of IAG G with vertices with copy number excess (i.e., telomere vertices) into a an IAG G^′ with no telomere vertices. Added supplemental segment and adjacency edges are shown in grey

**Fig. 5**
CCR workflow on IAG presented in Fig. 4b. a Iterative Step 3 in CCR workflow: processing of pairs of consecutive adjacency edges in the min-ED, contraction of pairs that are present in all min-EDs, updates of the solution. Supplemental (both segment and adjacency) edges are shown in grey, contracted adjacency edges are shown as dotted orange. b Processing of the terminal graph obtained after iterative Step 3 execution with removal of contracted adjacency edges, incident to them segment edges, and dangling edges after that. Search for a collection L of adjacency edges in which no pair {e,f} share the same copy of a segment

**Fig. 6**
Contigs obtained with both CCR (red) and the baseline “naive” contig inference approach (blue) and on haplotype-specific cancer karyotype graphs. a Number of contigs recovered with CCR and the “naive” approach from input karyotype graphs obtained with RCK on HATCHet CNA input. b Number of contigs recovered with CCR and the “naive” approach from input karyotype graphs obtained with RCK on Battenberg CNA input. c N50 metric for contigs recovered with CCR and “naive” approach from input karyotype graphs obtained with RCK on HATCHet CNA input. d N50 metric for contigs recovered with CCR and “naive” approach from input karyotype graphs obtained with RCK on Battenberg CNA input

See this image and copyright information in PMC

References

1. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25(16):2078–9. doi: 10.1093/bioinformatics/btp352. - DOI - PMC - PubMed
1. Metzker ML. Sequencing technologies — the next generation. Nat Rev Genet. 2010;11(1):31–46. doi: 10.1038/nrg2626. - DOI - PubMed
1. Koboldt DC, Chen K, Wylie T, Larson DE, McLellan MD, Mardis ER, Weinstock GM, Wilson RK, Ding L. VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics. 2009;25(17):2283–5. doi: 10.1093/bioinformatics/btp373. - DOI - PMC - PubMed
1. Ritz A, Bashir A, Sindi S, Hsu D, Hajirasouliha I, Raphael BJ. Characterization of Structural variants with single molecule and hybrid sequencing approaches. Bioinformatics. 2014;30(24):3458–66. doi: 10.1093/bioinformatics/btu714. - DOI - PMC - PubMed
1. Layer RM, Chiang C, Quinlan AR, Hall IM. LUMPY: a probabilistic framework for structural variant discovery. Genome Biol. 2014;15(6):84. doi: 10.1186/gb-2014-15-6-r84. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

P30 CA045508/CA/NCI NIH HHS/United States

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Recovering rearranged cancer chromosomes from karyotype graphs

Affiliations

Recovering rearranged cancer chromosomes from karyotype graphs

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources