Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 May 26:6:26483.
doi: 10.1038/srep26483.

Systematic analysis of mutation distribution in three dimensional protein structures identifies cancer driver genes

Affiliations

Systematic analysis of mutation distribution in three dimensional protein structures identifies cancer driver genes

Akihiro Fujimoto et al. Sci Rep. .

Abstract

Protein tertiary structure determines molecular function, interaction, and stability of the protein, therefore distribution of mutation in the tertiary structure can facilitate the identification of new driver genes in cancer. To analyze mutation distribution in protein tertiary structures, we applied a novel three dimensional permutation test to the mutation positions. We analyzed somatic mutation datasets of 21 types of cancers obtained from exome sequencing conducted by the TCGA project. Of the 3,622 genes that had ≥3 mutations in the regions with tertiary structure data, 106 genes showed significant skew in mutation distribution. Known tumor suppressors and oncogenes were significantly enriched in these identified cancer gene sets. Physical distances between mutations in known oncogenes were significantly smaller than those of tumor suppressors. Twenty-three genes were detected in multiple cancers. Candidate genes with significant skew of the 3D mutation distribution included kinases (MAPK1, EPHA5, ERBB3, and ERBB4), an apoptosis related gene (APP), an RNA splicing factor (SF1), a miRNA processing factor (DICER1), an E3 ubiquitin ligase (CUL1) and transcription factors (KLF5 and EEF1B2). Our study suggests that systematic analysis of mutation distribution in the tertiary protein structure can help identify cancer driver genes.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Explanation of the 3D permutation test and significant genes.
(a) 3D permutation test procedure. Average distance between mutations in the 3D structure is calculated. Significance was tested by a permutation test. (b) Significant genes in multiple cancers. Twenty-three genes were significant in multiple cancers. Annotation was based on Vogelstein et al.. Two genes (CHEK2 and NR1H2) were identified in multiple cancers due to single hotspot in each gene. However, mutant alleles of these two hotspots were found in the dbSNP database (rs146546850 of CHEK2 and rs55817866 of NR1H2) and population allele frequencies were not low (0.7% for rs146546850 and 9.7% for rs55817866), therefore we removed these genes from this table.
Figure 2
Figure 2. Analysis of candidate driver genes.
(a) Proportion of oncogenes. Oncogenes were enriched in the significant gene set (Fisher’s exact test; p-value = 7.4 × 10−18; odds ratio = 29.5). Sig gene - significant gene. NS gene - not significant gene. (b) Proportion of TSGs. TSGs were enriched in the significant gene set (Fisher’s exact test; p-value = 7.1 × 10−10; odds ratio = 11.8). (c) Proportion of genes in COSMIC cancer gene census. COSMIC cancer genes were enriched in the significant gene set (Fisher’s exact test; p-value = 5.9 × 10−19; odds ratio = 9.0). (d) Frequency of mutated samples for significant oncogenes, significant TSGs, the other significant genes and non-significant genes. Frequency of mutated samples of TSGs was significantly higher than that of oncogenes, the other significant genes and not significant genes. P-values were obtained by Wilcoxon’s rank sum test. Unadjusted p-values were shown. (e) Adjusted average distance between mutations in significant oncogenes, significant TSGs, the other significant genes and not significant genes. Average distance between mutations was adjusted by length of the coding region. Adjusted average distance of TSGs was significantly larger than that of oncogenes, and the other significant genes. Adjusted average distance of oncogenes, the other significant genes was significantly smaller than that of not significant genes. P-values were obtained by Wilcoxon’s rank sum test. Unadjusted p-values were shown.
Figure 3
Figure 3. Examples of the significant genes identified by the 3D permutation.
Primary structure, 3D structure, and mutations are shown. Amino acid sequence and 3D structure are shown. Recurrently mutated codons are indicated by the square in the amino acid sequence, and the position of the codons are shown in the 3D structures. Figures of the 3D structure were generated by VMD software. (a) BRAF. q-value of 3D permutation (q-value3D) <10−6. q-value of 1D permutation (q-value1D) <10−6. Codon 600 (hotsopt) is shown in blue in the 3D structure. (b) RAC1. NHSC: q-value3D <10−6. q-value1D  =  n.s. (not significant). SKCM: q-value3D <10−6. q-value1D = 0.0067. Mutations in NHSC and SKCM are shown in blue and orange, respectively. The common mutation is shown in red. (c) DICER1. UCEC: q-value3D = 0.0044. q-value1D = n.s. Sequence in the structure correspond to RNase III domain in DICER1. (d) FAS. CESC; q-value3D = 0.025. q-value1D = n.s. Change at codon 240 causes loss of interaction with FADD. Two other mutations are close to the codon 240 in the 3D structure, suggesting that these positions are functionally important as well. (e) KRAS. BLCA: q-value3D = 0.0020. q-value1D = n.s. Codon 12 (hotsopt) is shown in blue in the 3D structure.

References

    1. Wheeler D. A. & Wang L. From human genome to cancer genome: the first decade. Genome Res 23, 1054–1062, 10.1101/gr.157602.113 (2013). - DOI - PMC - PubMed
    1. Vogelstein B. & Kinzler K. W. Cancer genes and the pathways they control. Nat Med 10, 789–799, 10.1038/nm1087 (2004). - DOI - PubMed
    1. Lawrence M. S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218, 10.1038/nature12213 (2013). - DOI - PMC - PubMed
    1. Tamborero D. et al. Comprehensive identification of mutational cancer driver genes across 12 tumor types. Sci Rep 3, 2650, 10.1038/srep02650 (2013). - DOI - PMC - PubMed
    1. Olivier M., Hollstein M. & Hainaut P. TP53 mutations in human cancers: origins, consequences, and clinical use. Cold Spring Harb Perspect Biol. 2, a001008 (2010). - PMC - PubMed

Publication types

Substances