Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2014 Aug;15(8):556-70.
doi: 10.1038/nrg3767. Epub 2014 Jul 8.

Expanding the computational toolbox for mining cancer genomes

Affiliations
Review

Expanding the computational toolbox for mining cancer genomes

Li Ding et al. Nat Rev Genet. 2014 Aug.

Abstract

High-throughput DNA sequencing has revolutionized the study of cancer genomics with numerous discoveries that are relevant to cancer diagnosis and treatment. The latest sequencing and analysis methods have successfully identified somatic alterations, including single-nucleotide variants, insertions and deletions, copy-number aberrations, structural variants and gene fusions. Additional computational techniques have proved useful for defining the mutations, genes and molecular networks that drive diverse cancer phenotypes and that determine clonal architectures in tumour samples. Collectively, these tools have advanced the study of genomic, transcriptomic and epigenomic alterations in cancer, and their association to clinical properties. Here, we review cancer genomics software and the insights that have been gained from their application.

PubMed Disclaimer

Figures

Box 1 Figure
Box 1 Figure. Data requirements for capturing heterozygous variants
Identifying a single-nucleotide variant (SNV) requires its observation in multiple reads, usually at least 3, but accrual of these reads is governed by the random dynamics of sampling and coverage, quantified in the ideal case (pure samples, perfect data, and no sequence bias) by Eq. (1) for various tumour mass fractions. Data requirements are pushed appreciably higher by subclones that comprise smaller fractions of the entire tumour mass. Red triangle indicates redundancy of 340X for 99% probability of observing ≥3 reads in a 5% subclone.
Box 2 Figure
Box 2 Figure. Environmental factor contributing to cancer risk
Smoking, viruses, and radiation can strongly affect mutation rates across the cancer genome and mutation profiles across cancer types and human populations. Signatures of these effects can often be detected in tumour genome sequences.
Figure 1
Figure 1. Sample procurement, sequencing, and analysis roadmap
(A) Sequencing strategy: Most cancer genomics investigations sequence the genome of a tumour sample from primary or metastatic lesion, starting with a non-specific ‘global’ sample pooled from biopsy or resection. Because the spatial distribution of any resident subclones is not known a priori, it will become increasingly common to sequence specific regions from a tumor section separately. In the limit, single-cell sequencing can also be performed on flow-sorted nuclei to assess cellular diversity (B) Overview of the sequencing and analysis process: tumour and adjacent healthy tissue samples are sequenced using high-throughput instruments to obtain genome, exome, RNA and other types of data. After alignment, a battery of detection tools identifies both small (SNV, indel) and large (copy number, structural variation, gene fusions) alterations, which are then annotated and analyzed individually (Level I) —for example, for likely functional implications — and collectively (Level II) —for example, to identify relevant gene pathways and networks
Figure 2
Figure 2. Biological factors relevant to assessing significant genes in cancer
Genomic analysis establishes mutation frequencies of genes and helps characterize background mutation rates. Specific mutation hot spots have been found in the various cancer types. Other factors have also been shown to affect the background mutation rate of a gene, including gene length, expression level, and replication timing. State-of-the-art tools, such as MuSiC and MutSig give proper consideration to these and many other factors, for example transition versus transversion frequency, in determining the significantly mutated genes that contribute substantively to cancer initiation and progression.
Figure 3
Figure 3. Significantly mutated genes, pathways and networks
Given the mutational status of genes across multiple patients, one can distinguish driver from passenger mutations using several strategies. Single-gene tests determine whether the observed number of samples having a mutation in the gene is significantly greater than what is expected under an appropriate null model. Pathway or gene set approaches examine whether multiple genes in pre-defined sets, as obtained for example from a curated database like KEGG, GO, or MSigDB, have more mutations than expected. These tests are biased to the prior knowledge of gene cascades residing in these databases, but the numbers of tests are relatively small, so the risks associated with Type I error [G] tend to be manageable. Conversely, network approaches rely only on knowledge of known protein-protein or protein-DNA interactions in examining combinations of mutations on whole-genome interaction networks, for example using the analog of heat diffusion. Because these approaches are unbiased, they furnish the possibility of inferring novel combinations of genes relevant to cancer, but larger numbers of hypothesis tests imply that greater care must be taken for multiple testing correction.
Figure 4
Figure 4. Conceptual example of clonal evolution model and clonality analysis
(A) The founding clone (yellow) persists during the course of the disease. Another clone (green) present at time point 1 faces extinction before time point 2, but new subclones (blue/time point 2 and orange/time point 3) emerge during disease progression. (B) SciClone algorithm detects the three mutation clusters present at time point 3.

Similar articles

Cited by

References

    1. Sanger F, Nicklen S, Coulson AR. DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci U S A. 1977;74:5463–5467. - PMC - PubMed
    1. Sanger F, Nicklen S, Coulson AR. DNA sequencing with chain-terminating inhibitors. 1977. Biotechnology. 1992;24:104–108. - PubMed
    1. Lander ES, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. doi: 10.1038/35057062. - DOI - PubMed
    1. Ley TJ, et al. DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature. 2008;456:66–72. doi: 10.1038/nature07485. - DOI - PMC - PubMed
    1. Shendure J, Lieberman Aiden E. The expanding scope of DNA sequencing. Nat Biotechnol. 2012;30:1084–1094. doi: 10.1038/nbt.2421. - DOI - PMC - PubMed

Publication types