Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Aug 30;15(8):437.
doi: 10.1186/s13059-014-0437-8.

Deep sequencing of the X chromosome reveals the proliferation history of colorectal adenomas

Deep sequencing of the X chromosome reveals the proliferation history of colorectal adenomas

Anna De Grassi et al. Genome Biol. .

Abstract

Background: Mismatch repair deficient colorectal adenomas are composed of transformed cells that descend from a common founder and progressively accumulate genomic alterations. The proliferation history of these tumors is still largely unknown. Here we present a novel approach to rebuild the proliferation trees that recapitulate the history of individual colorectal adenomas by mapping the progressive acquisition of somatic point mutations during tumor growth.

Results: Using our approach, we called high and low frequency mutations acquired in the X chromosome of four mismatch repair deficient colorectal adenomas deriving from male individuals. We clustered these mutations according to their frequencies and rebuilt the proliferation trees directly from the mutation clusters using a recursive algorithm. The trees of all four lesions were formed of a dominant subclone that co-existed with other genetically heterogeneous subpopulations of cells. However, despite this similar hierarchical organization, the growth dynamics varied among and within tumors, likely depending on a combination of tumor-specific genetic and environmental factors.

Conclusions: Our study provides insights into the biological properties of individual mismatch repair deficient colorectal adenomas that may influence their growth and also the response to therapy. Extended to other solid tumors, our novel approach could inform on the mechanisms of cancer progression and on the best treatment choice.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Evolutionary model of tumor clonal expansion. (A) Expansion of a monoclonal adenoma represented as a rooted binary tree [2]. Colored dots indicate somatic mutations that progressively occur during tumor development and are inherited by the surviving progeny. (B) Tree recapitulating the proliferation as inferred from the mutation profile. The combination of nodes in the tree reveals the occurrence of selection and cell death during tumor proliferation.
Figure 2
Figure 2
Sequencing throughput and strategy for variant calling. (A) Uniformity of coverage in the exonic and intergenic targeted regions. The mean coverage in each sample is highlighted in red. More than 50% of targeted regions were sequenced at least at 300× coverage in all four tumors. (B) Pipeline for variant calling. First, filters for quality scores and for propensity to accumulate errors were applied. Second, statistical tests were applied to account for the coverage and quality score of the variant site (Bernoulli distribution and Chernoff bound) and for the error accumulation of the surrounding region (Binomial distribution). Each test was performed on forward and reverse reads independently, and the resulting four Ps were adjusted using Bonferroni correction. Candidate variants were retained if they passed all filters on mismatches and all statistical tests of the variant calling. The resulting ensemble of all somatic mutations at various frequencies constituted the adenoma mutation profile. (C) Variation of the quality score at different positions along the read. In sample A1 mismatches were evenly distributed along the read, with a slight decrease towards the end. In the other samples there was higher occurrence of mismatches at the beginning and at the end of the read, indicating that these positions were prone to accumulate errors. (D) Cumulative percentage of mismatches at each read position for base calls with quality score ≥30. In each sample, we only considered the portion of the reads where a linear correlation was observed. This corresponded to positions (20 to 76) for sample A1 and to positions (20 to 60) for samples and A2, A3, A4.
Figure 3
Figure 3
Mutation profiles of the four adenomas. (A) From the outer to the inner circles, plots display the entire X chromosome, the intergenic (blue, approximately 15 Mbp) and genic (orange, approximately 2 Mbp) targeted sites, and the sequenced sites (green) in the four samples. Genomic regions were divided into 500 kbp bins and color gradient represents target (blue and orange) and coverage (green) density. The six concentric circles in each sample represent decreasing mutation frequencies, from 60% to 10%. Dots depict intergenic (blue), non-coding (yellow), synonymous (orange), and non-synonymous (red) somatic mutations. The three cancer genes [27-29] with non-synonymous mutations are also highlighted. (B) Mutation pattern of somatic mutations, SNPs, and the rest of mismatches in the four samples. The mutation profile of each adenoma shows the typical pattern of MMR-deficient colorectal tumors [30]. (C) Schematic representation of the functional domains of the GPR112 protein, with the non-synonymous mutation in sample A4 (red line) and the missense mutation previously reported in colorectal cancer (CRC) (R54X, red circle) [28]. (D) Electropherogram of the R48C mutation occurring in the pentaxin domain of GPR112 in sample A4.
Figure 4
Figure 4
Accuracy of Illumina frequency estimation and mutation clustering in the four tumors. (A) Distribution of somatic mutations according to their frequency. Green bars represent clonal mutations. (B) Linear regression curve of the mutation frequency of 10 proportions of mutated allele measured with qPCR and Illumina sequencing. (C) Pipeline for mutation clustering. First, 95% confidence interval was assigned to each mutation. Second, mutations with non-overlapping confidence interval or, in case of overlap, with the smallest confidence interval, were identified as cluster seeds. Third, mutations unambiguously overlapping with only one seed were assigned to that seed. Finally, all mutations overlapping with more than one seed were assigned to a given cluster according to the highest binomial probability. (D) Clusters of mutations in the four samples. In all samples, clusters are highlighted in yellow and numbered progressively. For each cluster, the maximum, the minimum, and the number of mutations are shown. Green clusters contain clonal mutations. (E) Expected and observed somatic mutations for each cluster. The expected number of mutations per cluster was calculated as the number of observed mutations over the fraction of positions with coverage equal or higher than the minimum coverage for those positions. The number of observed mutations reflected that of expected mutations, except for low frequency mutations that were less than expected. These mutations were under-represented in our datasets likely because they are more difficult to identify and to distinguish from random errors. (F) Clustering performance. Shown are the distributions of the number of clusters obtained from 1,000 simulations. At each iteration, the frequency of 40% random mutations was varied within 95% confidence intervals, and mutations were re-clustered with our method. In all samples, the median of the distribution is equal to the observed number of clusters. Except for sample A3, the clustering of all other samples is robust even upon modification of higher percentage of mutations (Figure S4 in Additional file 2).
Figure 5
Figure 5
Proliferation trees. (A) Pipeline for tree reconstruction. Mean cluster frequencies were used to identify the root and to enumerate the external nodes (leaves) descending from each cluster. Once the combination of nodes (Ntot) was identified for each tumor, the tree was rebuilt using a recursive algorithm. As explained in the text, the algorithm was based on the parent-descent relationship between nodes of a full binary tree, which resembles the parent-descent relationship between cells, and implies that each parent node led to two descending nodes. The algorithm started from the root of the tree and progressed down to the leaves by generating pairs of nodes according to the combination found in Ntot. In the shown example, the first two nodes that directly descended from root A are node B, which led to two leaves, and node C, which leads to one leaf E and to node D. Node D, in turn, produces two leaves E. (B) Proliferation trees of the four samples. Each circle represents one node of the tree. In the dominant branch, mutations can be assigned to a given node (red) and the circle size is proportional to the number of mutations. Filled circles identify nodes supported by the gold sets (mutations with frequency ≥4% and in >6 different read positions). Of the four highly similar trees of sample A3 that were compatible with the obtained combination of nodes (Figure S6 in Additional file 2), only the one that makes no a priori assumption on the proliferation history is shown.

Similar articles

Cited by

  • Cancer genomics just got personal.
    Marszalek RT. Marszalek RT. Genome Biol. 2014;15(9):464. doi: 10.1186/s13059-014-0464-5. Genome Biol. 2014. PMID: 25315058 Free PMC article. No abstract available.

References

    1. Nowell PC. The clonal evolution of tumor cell populations. Science. 1976;194:23–28. doi: 10.1126/science.959840. - DOI - PubMed
    1. Otter R. The number of trees. Ann Math. 1948;49:583–599. doi: 10.2307/1969046. - DOI
    1. Campbell PJ, Pleasance ED, Stephens PJ, Dicks E, Rance R, Goodhead I, Follows GA, Green AR, Futreal PA, Stratton MR. Subclonal phylogenetic structures in cancer revealed by ultra-deep sequencing. Proc Natl Acad Sci U S A. 2008;105:13081–13086. doi: 10.1073/pnas.0801523105. - DOI - PMC - PubMed
    1. Shah SP, Morin RD, Khattra J, Prentice L, Pugh T, Burleigh A, Delaney A, Gelmon K, Guliany R, Senz J, Steidl C, Holt RA, Jones S, Sun M, Leung G, Moore R, Severson T, Taylor GA, Teschendorff AE, Tse K, Turashvili G, Varhol R, Warren RL, Watson P, Zhao Y, Caldas C, Huntsman D, Hirst M, Marra MA, Aparicio S. Mutational evolution in a lobular breast tumour profiled at single nucleotide resolution. Nature. 2009;461:809–813. doi: 10.1038/nature08489. - DOI - PubMed
    1. De Grassi A, Segala C, Iannelli F, Volorio S, Bertario L, Radice P, Bernard L, Ciccarelli FD. Ultradeep sequencing of a human ultraconserved region reveals somatic and constitutional genomic instability. PLoS Biol. 2010;8:e1000275. doi: 10.1371/journal.pbio.1000275. - DOI - PMC - PubMed

Publication types