Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Aug;65(4):547-563.
doi: 10.1111/rssc.12136. Epub 2016 Jan 12.

Bayesian inference for intratumour heterogeneity in mutations and copy number variation

Affiliations

Bayesian inference for intratumour heterogeneity in mutations and copy number variation

Juhee Lee et al. J R Stat Soc Ser C Appl Stat. 2016 Aug.

Abstract

Tumor samples are heterogeneous. They consist of different subclones that are characterized by differences in DNA nucleotide sequences and copy numbers on multiple loci. Heterogeneity can be measured through the identification of the subclonal copy number and sequence at a selected set of loci. Understanding that the accurate identification of variant allele fractions greatly depends on a precise determination of copy numbers, we develop a Bayesian feature allocation model for jointly calling subclonal copy numbers and the corresponding allele sequences for the same loci. The proposed method utilizes three random matrices, L , Z and w to represent subclonal copy numbers ( L ), numbers of subclonal variant alleles ( Z ) and cellular fractions of subclones in samples ( w ), respectively. The unknown number of subclones implies a random number of columns for these matrices. We use next-generation sequencing data to estimate the subclonal structures through inference on these three matrices. Using simulation studies and a real data analysis, we demonstrate how posterior inference on the subclonal structure is enhanced with the joint modeling of both structure and sequencing variants on subclonal genomes. Software is available at http://compgenome.org/BayClone2.

Keywords: Categorical Indian buffet process; Feature allocation models; Markov chain Monte Carlo; Next-generation sequencing; Random matrices; Subclone; Variant Calling.

PubMed Disclaimer

Figures

Figure 1
Figure 1
(a) Tumor heterogeneity caused by clonal expansion. On days 90, 180, and 360, four somatic mutations (represented by red letters) and three somatic copy number gains (represented by brown letters) result in three tumor subclones. (b) Observed short reads (some with variants) are results of heterogeneous subclonal genomes. In particular, the formula at the bottom shows that subclonal alleles are mixed in proportions to produce short reads, which are mapped to different loci.
Figure 2
Figure 2
Three matrices for inference to describe the subclonal structure in Figure 1. L describes the subclonal copy numbers, Z describes the numbers of subclonal variant alleles, and w describes the cellular fractions of subclones.
Figure 3
Figure 3
Simulation 1: simulation truth.
Figure 4
Figure 4
Posterior inference for Simulation 1.
Figure 5
Figure 5
Heatmaps of estimated cellular prevalences from PyClone (a) and pstTRUE (b) for Simulation 1.
Figure 6
Figure 6
Simulation 2: simulation truth.
Figure 7
Figure 7
Posterior inference for Simulation 2.
Figure 8
Figure 8
Heatmaps of estimated cellular prevalences from PyClone (a) and pstTRUE (b) for Simulation 2.
Figure 9
Figure 9
Histograms of the Lung Cancer Dataset.
Figure 10
Figure 10
Posterior inference for the Lung Cancer Dataset.
Figure 11
Figure 11
Heatmaps of estimated cellular prevalences from PyClone (a) and (nst/Nst) (b) for the Lung cancer dataset.

References

    1. Bedard PL, Hansen AR, Ratain MJ, Siu LL. Tumour heterogeneity in the clinic. Nature. 2013;501(7467):355–364. - PMC - PubMed
    1. Biesecker LG, Spinner NB. A genomic view of mosaicism and human disease. Nature Reviews Genetics. 2013;14(5):307–320. - PubMed
    1. Broderick T, Jordan MI, Pitman J, et al. Cluster and feature modeling from combinatorial stochastic processes. Statistical Science. 2013;28(3):289–312.
    1. Brooks S, Gelman A, Jones G, Meng X-L. Handbook of Markov Chain Monte Carlo. CRC Press; 2011.
    1. Church DM, Schneider VA, Graves T, Auger K, Cunningham F, Bouk N, Chen HC, Agarwala R, McLaren WM, Ritchie GR, et al. Modernizing reference genome assemblies. PLoS biology. 2011;9(7):e1001091. - PMC - PubMed

Publication types

LinkOut - more resources