Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Nov;15(11):2823-2840.
doi: 10.1002/1878-0261.13056. Epub 2021 Jul 20.

Global mapping of cancers: The Cancer Genome Atlas and beyond

Affiliations
Review

Global mapping of cancers: The Cancer Genome Atlas and beyond

Carlo Ganini et al. Mol Oncol. 2021 Nov.

Abstract

Cancer genomes have been explored from the early 2000s through massive exome sequencing efforts, leading to the publication of The Cancer Genome Atlas in 2013. Sequencing techniques have been developed alongside this project and have allowed scientists to bypass the limitation of costs for whole-genome sequencing (WGS) of single specimens by developing more accurate and extensive cancer sequencing projects, such as deep sequencing of whole genomes and transcriptomic analysis. The Pan-Cancer Analysis of Whole Genomes recently published WGS data from more than 2600 human cancers together with almost 1200 related transcriptomes. The application of WGS on a large database allowed, for the first time in history, a global analysis of features such as molecular signatures, large structural variations and noncoding regions of the genome, as well as the evaluation of RNA alterations in the absence of underlying DNA mutations. The vast amount of data generated still needs to be thoroughly deciphered, and the advent of machine-learning approaches will be the next step towards the generation of personalized approaches for cancer medicine. The present manuscript wants to give a broad perspective on some of the biological evidence derived from the largest sequencing attempts on human cancers so far, discussing advantages and limitations of this approach and its power in the era of machine learning.

Keywords: artificial intelligence; cancer; molecular signature; omics; whole-genome sequencing.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1
Fig. 1
Global cancer genomics approaches. (A) Multiomics approach in The Cancer Genome Atlas (TCGA), the first international project to catalogue the mutational landscape of human cancers. Data from more than 10 000 patients worldwide have been analysed in terms of gene expression, CNAs, DNA methylation and mutations in the coding regions of the genome, providing the mutational landscape of 12 common cancers. (B) The Pan‐Cancer Analysis of Whole Genomes (PGAWG) project analysed more than 2600 whole‐cancer genomes from the International Genome Cancer Consortium (IGCG), building upon previous data from TCGA. Cancer type alteration burden has been evaluated regarding mutations (single base substitutions, double base substitutions, small insertion and deletions), CNAs, SVs and RNA expression (heatmap); genomic alterations have been catalogued according to the site of occurrence, coding region of a gene, regulatory regions as promoter, 5′ or 3′ untranslated regions (5′UTR and 3′UTR) or intron splicing variants, or for CNAs and SVs, providing a specific global profile for each gene alteration; among each class of alterations, driver mutations have been recognized and coding point mutations, together with somatic CNAs (SCNAs), represent the highest number of driver events in cancers (bar chart) [80]. (C) ARGO, Accelerating Research in Genomic Oncology, is the ongoing phase of the global‐scale omics approach of the ICGC, aimed at collecting omics and clinical data from more than 80 000 patients, with the goal to address key biological and clinical questions for each cancer type. This would allow the development of personalized medicine approaches for each cancer patient.
Fig. 2
Fig. 2
Mutational signatures of cancers. (A) Global genomics data obtained in the Pan‐Cancer Analysis of Whole Genomes (PCAWG) have been processed with fitting algorithm models to recognize mutational signatures in each cancer type. (B) The mutational profile of each cancer type can be dissected in multiple signatures according to the distribution of single base pair (bp) substitutions (SBS), double base substitutions (DBS) or small insertions/deletions (InDels); this approach allows researchers to correlate a specific signature with biological programme alterations in cancers (the APOBEC signature is an example) or with the clinical history of the patient (smoking‐associated signatures) [54]. (C) Signatures can be further investigated in their role in cellular or animal models using CRISPR‐Cas9 technology and single‐guided‐RNA screening platforms.
Fig. 3
Fig. 3
Mutational hotspots of cancers in noncoding regions of the genome. Mutational hotspots in cancer are frequently localized in known mutated genes and can act as drivers. Their frequency in noncoding regions has been recently evaluated [48]. Apart from known hotspots in coding regions, 25% of cancers show clusters of mutations that are localized at the 5′UTR or 3′UTR of genes, as well as on long noncoding RNAs and on their promoters. These hotspots can also be linked to specific signatures, such as UV, activated induced cytidine deaminases and APOBEC enzymes activity [112].
Fig. 4
Fig. 4
Evolutionary history of cancers, molecular timing and early detection. (A) The mutational history of each cancer can be evaluated from a single biopsy by considering the evolution of tumour heterogeneity. (B) The clonal allelic status of point mutations can be used as a model to classify mutations as preferentially early, variable, constant, late or subclonal. The first two classes of mutations usually harbour driver mutations among many genes, whereas the late and the subclonal classes usually do not contain driver mutations. (C) The classification of mutations according to their type [driver, CNAs, mutational signatures (Sigs)] and their allelic burden allows the reconstruction of a timeline for the development of each tumour [49], potentially extending the time for an early diagnostic approach. MRCA, most recent common ancestor.
Fig. 5
Fig. 5
Executable cancer models. (A) Experimental data from a global omics approach can be used as a matrix source for mechanistic computational models that can be continuously processed and refined using data from different cancer types and patients. This will provide data‐based mechanistic hypotheses on each cancer sample. (B) The data obtained through a global omics approach can be further integrated in a machine‐learning system, which is able to refine its ability to highlight mechanistic processes at the root of each cancer sample and can be further integrated with patient‐derived omics and clinical data to develop more precise information of cancer stage and development, ultimately allowing precise personalized medicine interventions.

References

    1. Ding L, Bailey MH, Porta‐Pardo E, Thorsson V, Colaprico A, Bertrand D, Gibbs DL, Weerasinghe A, Huang K‐L, Tokheim C et al. (2018) Perspective on oncogenic processes at the end of the beginning of cancer genomics. Cell 173, 305–320.e10. - PMC - PubMed
    1. Mantini G, Pham TV, Piersma SR & Jimenez CR (2020) Computational analysis of phosphoproteomics data in multi‐omics cancer studies. Proteomics 21, e1900312. - PubMed
    1. Luan M, Song F, Qu S, Meng XI, Ji J, Duan Y, Sun C, Si H & Zhai H (2020) Multi‐omics integrative analysis and survival risk model construction of non‐small cell lung cancer based on The Cancer Genome Atlas datasets. Oncol Lett 20, 58. - PMC - PubMed
    1. Zhang B, Yang L, Wang X & Fu D (2021) Identification of a survival‐related signature for sarcoma patients through integrated transcriptomic and proteomic profiling analyses. Gene 764, 145105. - PubMed
    1. Wu L, Yang Y, Guo X, Shu X‐O, Cai Q, Shu X, Li B, Tao R, Wu C, Nikas JB et al. (2020) An integrative multi‐omics analysis to identify candidate DNA methylation biomarkers related to prostate cancer risk. Nat Commun 11, 3905. - PMC - PubMed

Publication types