Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Oct 12;107(41):17604-9.
doi: 10.1073/pnas.1009117107. Epub 2010 Sep 23.

A mathematical framework to determine the temporal sequence of somatic genetic events in cancer

Affiliations

A mathematical framework to determine the temporal sequence of somatic genetic events in cancer

Camille Stephan-Otto Attolini et al. Proc Natl Acad Sci U S A. .

Abstract

Human cancer is caused by the accumulation of genetic alterations in cells. Of special importance are changes that occur early during malignant transformation because they may result in oncogene addiction and represent promising targets for therapeutic intervention. Here we describe a computational approach, called Retracing the Evolutionary Steps in Cancer (RESIC), to deduce the temporal sequence of genetic events during tumorigenesis from cross-sectional genomic data of tumors at their fully transformed stage. When applied to a dataset of 70 advanced colorectal cancers, our algorithm accurately predicts the sequence of APC, KRAS, and TP53 mutations previously defined by analyzing tumors at different stages of colon cancer formation. We further validate the method with glioblastoma and leukemia sample data and then apply it to complex integrated genomics databases, finding that high-level EGFR amplification appears to be a late event in primary glioblastomas. RESIC represents the first evolutionary mathematical approach to identify the temporal sequence of mutations driving tumorigenesis and may be useful to guide the validation of candidate genes emerging from cancer genome surveys.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Schematic diagram of RESIC. For cancer types with clinicopathologically defined stages (e.g., colorectal cancer), the temporal sequence in which genetic alterations arise during tumorigenesis can be identified by genotyping samples from patients at different stages of disease progression. For cancer types that are diagnosed de novo without detectable precursor lesions (e.g., primary GBM), the order of alterations cannot be identified with a similar approach. We present an evolutionary computational algorithm (RESIC) to identify the temporal sequence of events arising during tumorigenesis utilizing genomic data from a large number of samples (one per patient) of a particular histological type. In step 1, we use an algorithm such as GISTIC (4) to identify recurrent genetic aberrations in the genomics dataset. In step 2, these aberrations are ranked according to their pairwise association (statistically significant correlation, e.g., Fisher’s exact test). In step 3, the most likely sequence of these associated events is identified using RESIC. The results generated by RESIC are used to reconstruct the order in which alterations arise during development of a particular cancer type (step 4). Our methodology is applicable to large-scale datasets and can be used to identify the temporal sequence of many genetic alterations.
Fig. 2.
Fig. 2.
Evolutionary dynamics of genetic alterations leading to cancer. (A) Transition between mutational states and schematic representation of different evolutionary trajectories toward cancer. Initially, the population consists of N cells with genotype i and fitness (i.e., growth rate) ri (detail). During each time step, a cell is chosen at random proportional to fitness to divide, and its daughter cell replaces another randomly chosen cell. During each cell division, a mutation arises with probability u. The mutated daughter has genotype j and fitness rj. If rj > ri, the mutated daughter cell is advantageous as compared to the mother cell; if rj < ri, it is disadvantageous, and if rj = ri, it is selectively neutral. The probability that a mutated cell takes over the population is given by its fixation probability, ρ(ri,rj) = [1 - 1/(rj/ri)]/[1 - 1/(rj/ri)N]. If rj = ri, then ρ(ri,rj) = 1/N. The transition rate between states i and j is given by Nuρ(ri,rj) in small populations. A population of wild-type cells may accumulate mutations in different orders; an example path from the unmutated population to a state with three mutations is highlighted in green. (B) Population dynamics. The dynamics of patients accumulating mutations is represented in this network where nodes (i.e., mutational states) are populated according to the transition rates from one mutational state to the next. In the example shown here, cells can accumulate two mutations. The number of patients harboring cells with no mutations are denoted by X0, whereas those harboring mutations are denoted by X1, X2, and X3. There is a constant influx of cases into the initial node. Cells in these patients accumulate mutations and populate the mutational states. The outflow from the fully mutated state eventually drives the system into steady state. An optimization algorithm is used to identify the transition rates for which the number of patients in each node at steady state coincides with the observations in a cross-sectional genomics dataset. The optimized parameter values of the evolutionary process serve to identify the most likely trajectory through the network.
Fig. 3.
Fig. 3.
Validation of RESIC utilizing colorectal cancer and glioblastoma data. We tested the predictions of RESIC in colorectal cancer and glioblastoma because the order of some events leading to those tumor types has been identified (10, 14). In the schematics of the networks, nodes represent the numbers of patients with a particular genotype, whereas black arrows represent transitions between mutational states. (A) The order of APC, KRAS, and TP53 in colorectal cancer. APC is shown in red, KRAS in green, and TP53 in blue. All mutation rates are 10-7 per allele per cell division. We apply a pseudocount of 1 to the entire system to prevent states with zero observations. Schematics of the networks are shown at Left. We display the numbers and frequencies of patients in each mutational node in the network and show the most frequent paths through the network in the histograms at Right. Detailed results are listed in Table S2. (First Row) The APC–KRAS network. RESIC predicts that biallelic inactivation of APC likely occurs before any KRAS alteration. (Second Row) The APC–TP53 network. RESIC predicts that biallelic inactivation of APC likely occurs before TP53 inactivation. (Third Row) The KRAS–TP53 network. RESIC predicts that an alteration of KRAS likely occurs first. (B) The order of NF1 and TP53 in glioblastoma. NF1 is shown in orange. We study the mutational network of heterozygous alterations only since all NF1 and most TP53 mutations in the dataset are heterozygous (Table S1). A schematic of the network is shown at Left. Detailed results are listed in Table S2. We show the number of samples with each genotype observed in the complete set of 91 The Cancer Genome Atlas samples (black) and in the restricted set of 72 untreated samples (blue) (3). For both the unrestricted and the restricted sets, RESIC predicts that a TP53 point mutation likely occurs before NF1 is altered (Right).
Fig. 4.
Fig. 4.
Application of RESIC to secondary AML and primary GBM. (A) We analyzed a dataset of 57 patients with AML, including samples from two different disease states (MPN and post-MPN AML) from 14 different patients (19, 24). The data of AML patients were analyzed with RESIC (see B), whereas data from both MPN and AML patients were sequenced for JAK2 and TET2 (see C). (B) When applying RESIC to a set of secondary AML samples for which JAK2 and TET2 mutational status was known for both AML and MPN disease states, we found that JAK2 mutations likely precede TET2 mutations in this sample set. (C) Analysis of 14 patients for which samples were available from the MPN and AML disease states showed that TET2 mutations were present in the AML, but not preceding MPN sample, in 5 patients. (D) Genetic alterations in primary glioblastoma. The statistical significance of correlations between genetic alterations was calculated with Fisher’s exact test. Color codes range from 10-20 (red) to 100 (white). Significance after Bonferroni correction is marked at ∼3 × 10-7 (orange). Note that lesions colocalized on the same chromosome have stronger correlations likely caused by large-region amplifications or deletions and thus cannot be considered as independent genetic events. (E) Prediction of RESIC for the PTEN-p16-EGFR network in primary glioblastoma. We show the frequencies of the initiating and final mutational events of this network. RESIC predicts that p16 deletion or EGFR low-level amplification are the most common initiating events with frequencies of about 35–39% each, whereas high-level amplification of EGFR is the most frequent last event of this network with a frequency of 56.4%.

Comment in

References

    1. Bamford S, et al. The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website. Br J Cancer. 2004;91:355–358. - PMC - PubMed
    1. Sjöblom T, et al. The consensus coding sequences of human breast and colorectal cancers. Science. 2006;314:268–274. - PubMed
    1. TCGA. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature. 2008;455:1061–1068. - PMC - PubMed
    1. Beroukhim R, et al. Assessing the significance of chromosomal aberrations in cancer: Methodology and application to glioma. Proc Natl Acad Sci USA. 2007;104:20007–20012. - PMC - PubMed
    1. Taylor BS, et al. Functional copy-number alterations in cancer. PLoS ONE. 2008;3:e3179. - PMC - PubMed

Publication types