Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Mar 26:15:86.
doi: 10.1186/1471-2105-15-86.

A graph theoretic approach to utilizing protein structure to identify non-random somatic mutations

Affiliations

A graph theoretic approach to utilizing protein structure to identify non-random somatic mutations

Gregory A Ryslik et al. BMC Bioinformatics. .

Abstract

Background: It is well known that the development of cancer is caused by the accumulation of somatic mutations within the genome. For oncogenes specifically, current research suggests that there is a small set of "driver" mutations that are primarily responsible for tumorigenesis. Further, due to recent pharmacological successes in treating these driver mutations and their resulting tumors, a variety of approaches have been developed to identify potential driver mutations using methods such as machine learning and mutational clustering. We propose a novel methodology that increases our power to identify mutational clusters by taking into account protein tertiary structure via a graph theoretical approach.

Results: We have designed and implemented GraphPAC (Graph Protein Amino acid Clustering) to identify mutational clustering while considering protein spatial structure. Using GraphPAC, we are able to detect novel clusters in proteins that are known to exhibit mutation clustering as well as identify clusters in proteins without evidence of prior clustering based on current methods. Specifically, by utilizing the spatial information available in the Protein Data Bank (PDB) along with the mutational data in the Catalogue of Somatic Mutations in Cancer (COSMIC), GraphPAC identifies new mutational clusters in well known oncogenes such as EGFR and KRAS. Further, by utilizing graph theory to account for the tertiary structure, GraphPAC discovers clusters in DPP4, NRP1 and other proteins not identified by existing methods. The R package is available at: http://bioconductor.org/packages/release/bioc/html/GraphPAC.html.

Conclusion: GraphPAC provides an alternative to iPAC and an extension to current methodology when identifying potential activating driver mutations by utilizing a graph theoretic approach when considering protein tertiary structure.

PubMed Disclaimer

Figures

Figure 1
Figure 1
An example protein with three different domains. Under iPAC, the Domain A residues will influence the final positions of Domain C residues and vice versa, a result that is undesirable if the three domains are independent of each other. The residues in Domain A and Domain C will have no effect on each other’s final position via the graph theoretic approach.
Figure 2
Figure 2
The amount of rearrangement performed under each of the three insertion methods described as well as MDS. Each column on the x-axis represents one of the 1100 structures considered, with structures from the same protein adjacent to one another and the protein order determined lexicographically by protein name. The y-axis shows the Kendall Tau distance, which is equivalent to the number of swaps required to sort the protein back into {1,2,3,…,..} order using bubble sort. The proteins with at least one rearrangement higher than 150,000 represent the DPP4, F5, IDE, MET, PIK3C α, SEC23A and TF proteins, from left to right, respectively.
Figure 3
Figure 3
An example constructing order statistics over 3 samples with 7 total mutations. The number inside the box indicates the residue number. A "*" above a residue signifies a non-synonymous missense substitution mutation for that residue. Figure from Ryslik et al.[9].
Figure 4
Figure 4
A comparison of GraphPAC, iPAC and NMC over all the structures that were found to be significant. Each of the 3D methods are considered: all three GraphPAC insertion methods and iPAC. The size of each colored block represents the number of structures with the relationship described. For instance, out of the 223 structures with significant clusters found under the cheapest insertion method of GraphPAC (top left), 94 structures had more clusters identified under the GraphPAC approach as compared to the NMC approach. Green is used to designate structures where the 3D and NMC methods identified 1 cluster while purple is used to designate structures where the 3D and NMC methods identified more than 1 cluster.
Figure 5
Figure 5
The EGFR ectodomain fragment structure (PDB ID 2ITX) where the 719–768 cluster is colored in blue. The three mutations, 719, 751 and 768 are displayed as purple spheres.
Figure 6
Figure 6
The NRP-1 structure (PDB ID 2QQI) where the 277–432 cluster is colored in red. The mutations that disrupt VEGF binding, 297 and 320 are shown as orange spheres while the end-points of the cluster, 277 and 432, are shown as purple spheres.
Figure 7
Figure 7
The KRAS structure (PDB ID 3GFT) color coded by region: amino acids 13–22 are blue, 24–60 are red and 62–145 are yellow. Residues 12 and 13 which make up the most significant cluster are shown as purple spheres, while residues 23, 61, 117 and 146 are shown as brown spheres.
Figure 8
Figure 8
The BRAF structure (PDB ID 4E26) color coded by segment: I) amino acids 464–599 are orange 2) amino acids 601–671 are green. The α-carbons of the mutated residues 464, 466, 469, 581, 596, 597, 601 and 671 are shown as purple spheres. Residue 600 is shown as a red sphere.

References

    1. Vogelstein B, Kinzler KW. Cancer genes and the pathways they control. Nat Med. 2004;10(8):789–799. doi: 10.1038/nm1087. - DOI - PubMed
    1. Weinstein IB, Joe AK. Mechanisms of disease: Oncogene addiction–a rationale for molecular targeting in cancer therapy. Nat Clin Pract Oncol. 2006;3(8):448–457. doi: 10.1038/ncponc0558. - DOI - PubMed
    1. Greenman C, Stephens P, Smith R, Dalgliesh GL, Hunter C, Bignell G, Davies H, Teague J, Butler A, Stevens C, Edkins S, O’Meara S, Vastrik I, Schmidt EE, Avis T, Barthorpe S, Bhamra G, Buck G, Choudhury B, Clements J, Cole J, Dicks E, Forbes S, Gray K, Halliday K, Harrison R, Hills K, Hinton J, Jenkinson A, Jones D. et al.Patterns of somatic mutation in human cancer genomes. Nature. 2007;446(7132):153–158. doi: 10.1038/nature05610. - DOI - PMC - PubMed
    1. Wang T. Prevalence of somatic alterations in the colorectal cancer cell genome. Proc Natl Acad Sci. 2002;99(5):3076–3080. doi: 10.1073/pnas.261714699. - DOI - PMC - PubMed
    1. Bardelli A, Parsons DW, Silliman N, Ptak J, Szabo S, Saha S, Markowitz S, Willson JKV, Parmigiani G, Kinzler KW, Vogelstein B, Velculescu VE. Mutational analysis of the tyrosine kinome in colorectal cancers. Science. 2003;300(5621):949. doi: 10.1126/science.1082596. - DOI - PubMed

Publication types