Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Nov 25;15(1):76.
doi: 10.1186/s12943-016-0560-0.

Cancer somatic mutations cluster in a subset of regulatory sites predicted from the ENCODE data

Affiliations

Cancer somatic mutations cluster in a subset of regulatory sites predicted from the ENCODE data

Nisar A Shar et al. Mol Cancer. .

Abstract

Background: Transcriptional regulation of gene expression is essential for cellular differentiation and function, and defects in the process are associated with cancer. The ENCODE project has mapped potential regulatory sites across the complete genome in many cell types, and these regions have been shown to harbour many of the somatic mutations that occur in cancer cells, suggesting that their effects may drive cancer initiation and development. The ENCODE data suggests a very large number of regulatory sites, and methods are needed to identify those that are most relevant and to connect them to the genes that they control.

Methods: Predictive models of gene expression were developed by integrating the ENCODE data for regulation, including transcription factor binding and DNase1 hypersensitivity, with RNA-seq data for gene expression. A penalized regression method was used to identify the most predictive potential regulatory sites for each transcript. Known cancer somatic mutations from the COSMIC database were mapped to potential regulatory sites, and we examined differences in the mapping frequencies associated with sites chosen in regulatory models and other (rejected) sites. The effects of potential confounders, for example replication timing, were considered.

Results: Cancer somatic mutations preferentially occupy those regulatory regions chosen in our models as most predictive of gene expression.

Conclusion: Our methods have identified a significantly reduced set of regulatory sites that are enriched in cancer somatic mutations and are more predictive of gene expression. This has significance for the mechanistic interpretation of cancer mutations, and the understanding of genetic regulation.

Keywords: Cancer mutations; Cis regulation; Gene regulation; Modelling; Regulatory regions.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Building an expression model for CNN3 (ENST00000370206.4). a shows the mean squared error against the log (λ) LASSO penalty parameter with numbers above the graph indicating the number of predictive variables (non-zero coefficients) in the corresponding LASSO model. Dotted lines show possible choices of λ at minimum mean-squared error (λmin) and more conservatively at that value plus 1 standard error. This identifies models with 2 predictive variables as optimal. b shows the correlation between observed expression and predicted expression from the model. c and d show the correlation of DNaseI signal intensities and expression for the two candidate regulatory elements (CRRs) chosen by the LASSO method. e shows the correlation between DNaseI signal intensities and expression for an example rejected CRR. f shows the genomic location of the two chosen CRRs and one example rejected CRR
Fig. 2
Fig. 2
The chosen CRRs for NAB2 (ENST00000342556.5), STAT6 (ENST00000300134.2) and LRP1 (ENST00000243077.2). Black arrows link CRRs to the transcripts for which they were chosen in expression models; note that one CRR was chosen for both STAT6 and NAB2. Details of the chosen CRRs are given red boxes, including the bound transcription factors, sizes of the CRRs and mutations mapped from the COSMIC database. CRRs are labelled as enhancers if they show positive correlation with expression and repressors if they show negative correlation. The chosen CRRs are marked as red boxes if there is at least one reported mutation in them, and black otherwise
Fig. 3
Fig. 3
Graphical summary of the methodology employed

References

    1. Lawrence MS, Stojanov P, Polak P, Kryukov GV, Cibulskis K, Sivachenko A, Carter SL, Stewart C, Mermel CH, Roberts SA, et al. Mutational heterogeneity in cancer and the search for new cancer genes. Nature. 2013;499:214–218. doi: 10.1038/nature12213. - DOI - PMC - PubMed
    1. Rabbitts TH. Chromosomal translocations in human cancer. Nature. 1994;372:143–149. doi: 10.1038/372143a0. - DOI - PubMed
    1. Huang FW, Hodis E, Xu MJ, Kryukov GV, Chin L, Garraway LA. Highly Recurrent TERT Promoter Mutations in Human Melanoma. Science. 2013;339:957–959. doi: 10.1126/science.1229259. - DOI - PMC - PubMed
    1. Vinagre J, Almeida A, Populo H, Batista R, Lyra J, Pinto V, Coelho R, Celestino R, Prazeres H, Lima L, et al. Frequency of TERT promoter mutations in human cancers. Nat Commun. 2013;4. - PubMed
    1. Melton C, Reuter JA, Spacek DV, Snyder M. Recurrent somatic mutations in regulatory regions of human cancer genomes. Nat Genet. 2015;47:710–716. doi: 10.1038/ng.3332. - DOI - PMC - PubMed

Publication types

Substances