Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov 14;23(1):241.
doi: 10.1186/s13059-022-02794-9.

MEDICC2: whole-genome doubling aware copy-number phylogenies for cancer evolution

Affiliations

MEDICC2: whole-genome doubling aware copy-number phylogenies for cancer evolution

Tom L Kaufmann et al. Genome Biol. .

Abstract

Aneuploidy, chromosomal instability, somatic copy-number alterations, and whole-genome doubling (WGD) play key roles in cancer evolution and provide information for the complex task of phylogenetic inference. We present MEDICC2, a method for inferring evolutionary trees and WGD using haplotype-specific somatic copy-number alterations from single-cell or bulk data. MEDICC2 eschews simplifications such as the infinite sites assumption, allowing multiple mutations and parallel evolution, and does not treat adjacent loci as independent, allowing overlapping copy-number events. Using simulations and multiple data types from 2780 tumors, we use MEDICC2 to demonstrate accurate inference of phylogenies, clonal and subclonal WGD, and ancestral copy-number states.

Keywords: Aneuploidy; Cancer evolution; Chromosomal instability; Intratumor heterogeneity; Phylogenetic reconstruction; Single-cell sequencing; Somatic copy-number alterations; Whole-genome doubling.

PubMed Disclaimer

Conflict of interest statement

C.S. acknowledges grant support from Pfizer, AstraZeneca, Bristol Myers Squibb, Roche-Ventana, Boehringer-Ingelheim, Archer Dx Inc. (collaboration in minimal residual disease sequencing technologies), and Ono Pharmaceutical; is an AstraZeneca Advisory Board Member and Chief Investigator for the MeRmaiD1 clinical trial; has consulted for Amgen, Pfizer, Novartis, GlaxoSmithKline, MSD, Bristol Myers Squibb, AstraZeneca, Illumina, Genentech, Roche Ventana, GRAIL, Medicxi, Bicycle Therapeutics, Metabomed and the Sarah Cannon Research Institute; has stock options in Apogen Biotechnologies, Epic Bioscience, and GRAIL; and has stock options and is co-founder of Achilles Therapeutics. C.S. holds patents relating to assay technology to detect tumor recurrence (PCT/GB2017/053289); to targeting neoantigens (PCT/EP2016/059401), identifying patent response to immune checkpoint blockade (PCT/EP2016/071471), determining HLA LOH (PCT/GB2018/052004), predicting survival rates of patients with cancer (PCT/GB2020/050221), to treating cancer by targeting Insertion/deletion mutations (PCT/GB2018/051893), identifying insertion/deletion mutation targets (PCT/GB2018/051892); methods for lung cancer detection (PCT/US2017/028013), identifying responders to cancer treatment (PCT/GB2018/051912); and a patent application to determine methods and systems for tumor monitoring (GB2114434.0).

Figures

Fig. 1
Fig. 1
MEDICC2 algorithm. a MEDICC2 infers cancer phylogenies from SCNA data from single cells or bulk sequencing using a minimum-event distance (MED) and infers the ancestral genomes. It allows for backmutations, obeys biological constraints, and solves the phylogeny problem where ancestral genomes are not sampled. b Computing distances with WGD. Copy-number profiles are represented as vectors of positive integer copy numbers across chromosomes (here: two chromosomes with four segments each). To infer the correct MED, LOH events are considered first as lost segments cannot be re-gained by later events. WGD events span the full copy-number profile, whereas gain and loss events can affect an arbitrary number of segments within a chromosome. c Symmetric distance calculation. The MED from an ancestral state to a sample is asymmetric due to biological constraints. The final symmetric distance between two samples is computed as the sum of distances from an ancestral genome to both samples, while minimizing over all possible ancestors. d Schematic overview of the MEDICC2 workflow. Haplotype-specific copy-number profiles are either pre-phased or undergo evolutionary phasing (see e). Pairwise MEDs are computed between all genomes, followed by tree inference and ancestral reconstruction which determines the final branch lengths of the tree. Results are reported to the user as a patient summary and plot. e Evolutionary phasing. Copy-number profiles for both alleles are jointly encoded as an unweighted phasing FST P where both possible allele configurations are encoded at each position in the sequence. Evolutionary phasing then determines the optimal configuration (bold arrows) and extracts final haplotypes (orange and blue) by computing the MED between the phasing FST and two reference haplotypes. An example of major/minor copy number, phased copy number, and the MED from the diploid is shown at the bottom. Abbreviations: FST: Finite-state transducer, MED: Minimum-event distance, LOH: Loss-of-heterozygosity, WGD: Whole-genome doubling
Fig. 2
Fig. 2
Algorithm performance and validation. a Runtime of different composition strategies. Copy-number profiles were simulated with increasing lengths from 20 to 200 segments. Computation time of the MEDs is linear with respect to the length of the input sequences. While MED-WGD took significantly longer to compute than the MED without WGD, the new lazy composition strategy reduced runtime by orders of magnitude. Shaded areas correspond to standard errors. b Performance on simulated data: Using an independent simulation routine we benchmarked MEDICC2 against a range of other methods. The reconstructed trees were compared to the simulated trees using the generalized Robinson-Foulds distance. As expected, the GRF distance rises with increasing tree size. MEDICC2 outperforms all other methods for all tree sizes. c Validation of MEDICC2 events with SVs. Pairs of MEDICC2 events and SVs were chosen based on an overlap of the starting segment. We assume MEDICC2 events to be supported by the SV if the ends also overlap. Shown here are the results using only duplications and deletions with size larger than 10Mbp. d The MEDICC2 WGD score for 2778 cancer genomes. Individual cancers are plotted based on their average ploidy and fraction of genome with LOH. The original separating line between WGD and non-WGD tumors was estimated by Dentro et al. as y = 2.9 − 2x. Correct “WGD” and “no WGD” predictions from MEDICC2 were marked in orange and blue while false predictions were marked in black and gray (latter if the PCAWG WGD status was “uncertain”). Abbreviations: NJ: Neighbor joining, Min. Ev.: Minimum Evolution
Fig. 3
Fig. 3
Evolutionary history of tumor subclones from patient A31. a SNV-based phylogeny. Reproduction of the SNV-based phylogeny as described in Gundem et al. [40] for the multi-sample prostate cancer tumor case with one sample (C) from the primary tumor and four samples (A, D, E, and F) from distinct metastatic sites. Original reconstruction was performed using an n-dimensional Bayesian Dirichlet process to cluster estimated cancer cell fractions of the single-nucleotide variants (SNV) identified in the WGS across samples. Only the major subclone of each sample is shown (“Methods”). b MEDICC2 phylogeny. Using multi-sample phased copy-number profiles, MEDICC2 detected the presence of WGD in the metastatic samples and its absence in the primary sample from A31. The MEDICC2 analysis identifies multiple MSAI events as well as parallel LOH on 6 and 13 (purple arrows). Individual events are marked in the copy-number track where they occur: gains (red) and losses (blue). The gray number in each branch corresponds to its bootstrap-confidence score while the WGD events from the MEDICC2 event detection are marked in green
Fig. 4
Fig. 4
Event detection for the Gundem et al. [40] cohort. a WGD detection. In the 10 patients, a total of 4 WGDs were detected, two of which were clonal, one subclonal and one in a terminal branch. b Distribution of arm-level events. Using the MEDICC2 event detection routine, we detected the number of times a whole chromosome arm was either gained or lost in a single branch. The gains and losses were aggregated over all patients and samples into a single score. This score was compared against the oncogene - tumor suppressor gene (OG-TSG) score derived by Davoli et al. [41]. A clear correlation between the gains/losses and the OG-TSG score (which is not based on copy numbers) is visible. c Distribution of gene-level events. The analysis was repeated on the basis of all 1729 individual genes present in the Davoli et al. dataset. On the x-axis, we plotted the base-10 logarithm of the genes’ p-values and flipped the sign for the oncogenes to create a single, continuous x-axis for both genesets. A small correlation is visible which becomes more pronounced when only considering the top 100 genes. Names are given for genes with p < 10−20
Fig. 5
Fig. 5
Inferred phylogeny for single-cell data with 1023 cells. Inferred phylogeny and allele-specific copy-number profiles for patient TN2 from Minussi et al. [46]. The diploid and most recent common ancestor to all cells are marked with green and blue circles, respectively. We manually selected clades from the phylogeny to match the superclones and subclones of the original publication. These are marked next to the tree in the colors of the original publication and with horizontal lines. The structure of the tree corresponds very clearly with distinct features of the copy-number profiles and matches the clonal structure derived in the original publication. Selected synapomorphies of the clone structure are highlighted with a yellow border and annotated on the figure

References

    1. McGranahan N, Burrell RA, Endesfelder D, Novelli MR, Swanton C. Cancer chromosomal instability: therapeutic and diagnostic challenges. EMBO Rep. 2012;13:528–538. - PMC - PubMed
    1. Sansregret L, Vanhaesebroeck B, Swanton C. Determinants and clinical implications of chromosomal instability in cancer. Nat Rev Clin Oncol. 2018;15:139–150. - PubMed
    1. Watkins TBK, Lim EL, Petkovic M, Elizalde S, Birkbak NJ, Wilson GA, et al. Pervasive chromosomal instability and karyotype order in tumour evolution. Nature. 2020; Available from: 10.1038/s41586-020-2698-6. - PMC - PubMed
    1. Jamal-Hanjani M, Wilson GA, McGranahan N, Birkbak NJ, Watkins TBK, Veeriah S, et al. Tracking the evolution of non–small-cell lung cancer. N Engl J Med. 2017;376:2109–2121. - PubMed
    1. Lee AJX, Endesfelder D, Rowan AJ, Walther A, Birkbak NJ, Futreal PA, et al. Chromosomal instability confers intrinsic multidrug resistance. Cancer Res. 2011;71:1858–1870. - PMC - PubMed

Publication types