Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012;7(4):e34637.
doi: 10.1371/journal.pone.0034637. Epub 2012 Apr 27.

A stochastic Markov chain model to describe lung cancer growth and metastasis

Affiliations

A stochastic Markov chain model to describe lung cancer growth and metastasis

Paul K Newton et al. PLoS One. 2012.

Abstract

A stochastic Markov chain model for metastatic progression is developed for primary lung cancer based on a network construction of metastatic sites with dynamics modeled as an ensemble of random walkers on the network. We calculate a transition matrix, with entries (transition probabilities) interpreted as random variables, and use it to construct a circular bi-directional network of primary and metastatic locations based on postmortem tissue analysis of 3827 autopsies on untreated patients documenting all primary tumor locations and metastatic sites from this population. The resulting 50 potential metastatic sites are connected by directed edges with distributed weightings, where the site connections and weightings are obtained by calculating the entries of an ensemble of transition matrices so that the steady-state distribution obtained from the long-time limit of the Markov chain dynamical system corresponds to the ensemble metastatic distribution obtained from the autopsy data set. We condition our search for a transition matrix on an initial distribution of metastatic tumors obtained from the data set. Through an iterative numerical search procedure, we adjust the entries of a sequence of approximations until a transition matrix with the correct steady-state is found (up to a numerical threshold). Since this constrained linear optimization problem is underdetermined, we characterize the statistical variance of the ensemble of transition matrices calculated using the means and variances of their singular value distributions as a diagnostic tool. We interpret the ensemble averaged transition probabilities as (approximately) normally distributed random variables. The model allows us to simulate and quantify disease progression pathways and timescales of progression from the lung position to other sites and we highlight several key findings based on the model.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Schematic diagram of human circulatory system showing circulating tumor cells (CTCs) detaching from primary tumor and getting trapped in capillary beds and other potential future metastatic locations as outlined by the ‘seed-and-soil’ framework.
Figure 2
Figure 2. Metastatic distributions from autopsy data set extracted from 3827 patients .
Y-axis in each graph represents a proportion between 0 and 1. The sum of all the heights is 1. These are the two key probability distributions used to ‘train’ our lung cancer progression model. (a) Overall metastatic distribution including all primaries. We call this distribution the ‘generic’ distribution as it includes all primary cancer types.; (b) Distribution of metastases associated with primary lung cancer. We call this distribution the ‘target’ distribution that we label formula image
Figure 3
Figure 3. The converged lung cancer network shown as a circular, bi-directional, weighted graph.
We use sample mean values for all edges connecting sites in the target distribution. The disease progresses from site 23 (lung) as a ‘random walker’ on this network. Arrow heads placed on the end or ends of the edges denote the direction of the connections. Edge weightings are not shown. There are 50 sites (defined in Table 1) obtained from the full data set of , with ‘Lung’ corresponding to site 23 placed on top. The 27 sites that are connected by edges are those from the target vector for lung cancer defined in Table 1.
Figure 4
Figure 4. Weight of outgoing edges from the lung (using sample mean values from ensemble) as compared with the ‘target’ distribution.
Figure 5
Figure 5. Histogram of edge values from lung to lymph nodes (reg) for 1000 trained ’s, showing that edge values (transition probabilities) are best thought of as random variables which are (approximately) normally distributed.
Dashed vertical line shows initial edge value associated with formula image Normal distribution with sample mean (0.15115) and variance (0.01821) is shown as overlay.
Figure 6
Figure 6. Histogram of edge values from lung to adrenal for 1000 trained ’s showing that edge values (transition probabilities) are best thought of as random variables which are (approximately) normally distributed.
Dashed vertical line shows initial edge value associated with formula image Normal distribution with sample mean (0.13165) and variance (0.01953) is shown as overlay.
Figure 7
Figure 7. Panel showing progression of state vector for lung cancer primary using the ensemble averaged lung cancer matrix.
Filled rectangles show the long-time metastatic distribution from the autopsy data in Figure 2(b), unfilled rectangles show the distribution at step k using the Markov chain model. (a) k = 0; (b) k = 2.
Figure 8
Figure 8. Panel showing progression of state vector for lung cancer primary using the ensemble averaged lung cancer matrix.
Filled rectangles show the long-time metastatic distribution from the autopsy data in Figure 2(b), unfilled rectangles show the distribution at step k using the Markov chain model. (a) k = 5; (b) k = ∞.
Figure 9
Figure 9. Probabilistic decomposition of pathways from lung to liver.
First transition probability is directly from lung to liver (0.08028±0.00946). Paths from the first-order sites to liver are shown as solid arrows. Paths from second-order sites to liver are shown as dashed arrows.
Figure 10
Figure 10. Mean first-passage time histogram for Monte Carlo computed random walks all starting from lung.
Error bars show one standard deviation. Values are normalized so that lymph node (reg) has value 1, and all others are in these relative time units.
Figure 11
Figure 11. Ensemble convergence to , starting from . y-axis is z, x-axis is step j.
We use an ensemble of 1000 trained matrices formula image each conditioned on the same initial matrix formula image The average convergence curve is shown, along with standard deviations marked along each decade showing the spread associated with the convergence rates.
Figure 12
Figure 12. Average distribution of the 27 non-zero singular values associated with the ensemble of 1000 matrices all obtained using the same . x-axis is the index n, y-axis is .
Data points (open circles) indicate the sample average, with error bars showing the sample standard deviations. Line is a least squares curve fit through formula image through formula image showing linear decrease with exponent formula image The 27 non-zero singular values reflect the fact that there are 27 entries in the steady-state target distribution for primary lung cancer. The two diamond shaped data points are the two singular values associated with the initial matrix formula image The 27 ‘asterix’ data points are those obtained from a trained matrix using a perturbed formula image with Rank 2 perturbation. See text for details.

References

    1. Ashworth T. A case of cancer in which cells similar to those in the tumors were seen in the blood after death. Australian Medical Journal. 1869;14:146.
    1. Fidler I. The pathogenesis of cancer metastasis: the ‘seed and soil’ hypothesis revisited. Nat Rev Cancer. 2003;3:453–458. - PubMed
    1. Paget S. The distribution of secondary growths in cancer of the breast. Lancet. 1889;1:571–573. - PubMed
    1. Weinberg R. Garland Science; 2006. The Biology of Cancer.
    1. Ewing J. W.B. Saunders, 6th Ed; 1929. Neoplastic Diseases: A Textbook on Tumors.

Publication types