Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jun 16;37(10):1367-1375.
doi: 10.1093/bioinformatics/btaa972.

Structural bioinformatics enhances mechanistic interpretation of genomic variation, demonstrated through the analyses of 935 distinct RAS family mutations

Affiliations

Structural bioinformatics enhances mechanistic interpretation of genomic variation, demonstrated through the analyses of 935 distinct RAS family mutations

Swarnendu Tripathi et al. Bioinformatics. .

Abstract

Motivation: Protein-coding genetic alterations are frequently observed in Clinical Genetics, but the high yield of variants of uncertain significance remains a limitation in decision making. RAS-family GTPases are cancer drivers, but only 54 variants, across all family members, fall within well-known hotspots. However, extensive sequencing has identified 881 non-hotspot variants for which significance remains to be investigated.

Results: Here, we evaluate 935 missense variants from seven RAS genes, observed in cancer, RASopathies and the healthy adult population. We characterized hotspot variants, previously studied experimentally, using 63 sequence- and 3D structure-based scores, chosen by their breadth of biophysical properties. Applying scores that display best correlation with experimental measures, we report new valuable mechanistic inferences for both hot-spot and non-hotspot variants. Moreover, we demonstrate that 3D scores have little-to-no correlation with those based on DNA sequence, which are commonly used in Clinical Genetics. Thus, combined, these new knowledge bear significant relevance.

Availability and implementation: All genomic and 3D scores, and markdown for generating figures, are provided in our supplemental data.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Our process for integrating computational scores from multiple molecular levels with experimental data to interpret hotspot molecular mechanisms of genomic variants. (A) Multiple distinct molecules carry relevant information for directly interpreting the effects of genomic variants: the DNA itself, the encoded mRNA, the linear protein and the 3D folded protein. (B) Schematic of GTP hydrolysis kinetics and nucleotide exchange kinetics are shown for RAS GTPase. (C) Our long-term goal is to predict altered mechanisms. In this study, we take the first step of aggregating multiple and diverse scores across molecular levels and correlating them with experimental measures of activity. (D) We have assembled 63 computational scores for how genomic variants may alter sequence or structure and analyzed their interrelationships, with the number of scores from each molecular level (DNA sequence, protein sequence and 3D structure) shown in parentheses. (E) Measurements of intrinsic GTP hydrolysis rate of RAS hotspot variants shown as a heatmap in descending order indicating low and high rates relative to the WT RAS (left panel) (Hunter et al., 2015). Assessment of underlying molecular mechanism of RAS hotspot variants shown as a heatmap by correlating experimental measurements (or scores) and computational scores (right panel) for Spearman correlation, |RSpearman|³ 0.6 (Supplementary Table S5) indicated below the dendrogram. The heatmap colors correspond to the z-scores, high (red) and low (blue) while blue and light red dendrogram at the top of the heatmap represent positive and negative RSpearman, respectively
Fig. 2.
Fig. 2.
KRAS non-hotspot variants computationally prioritized for effects on intrinsic and GAP-stimulated hydrolysis rates. We used the correlated computational scores (see Supplementary Table S4) to assess all non-hotspot variants for their potential to alter intrinsic and GAP-stimulated Khydrolysis, respectively similar to hotspot variants. Because we are specifically interested in global patterns among the variants, we used PHATE for dimensionality reduction. (A, B) 2D PHATE analysis was performed on 935 variants from 7 RAS in (A) and 493 variants from HRAS, KRAS and NRAS in (B) consisting the five computational scores that correlate with the intrinsic Khydrolysis of the KRAS hotspot variants (Fig. 1E). Eight hotspot somatic variants of KRAS are colored based on the intrinsic Khydrolysis measurements relative to the WT in the 2D PHATE plots in both (A) and (B). (C, D) Similar to (A) and (B) 2D PHATE analysis was performed on 935 (from 7 RAS) and 493 (from HRAS, KRAS and NRAS) variants in (C) and (D), respectively using the seven computational scores that correlate with the GAP-stimulated Khydrolysis (Supplementary Fig. S4C). In both (C) and (D), eight hotspot somatic variants of KRAS are colored based on the GAP-stimulated Khydrolysis measurements relative to the WT. High (red) and low (blue) hydrolysis rates are indicated in the color bar. (E) 3D structure of KRAS (PDB: 4OBE) showing the sensitive regions, phosphate-binding loop (P-loop) (amino-acid 10-17), switch-I (amino-acid 30-40) and switch-II (amino-acid 60-76) (Johnson et al., 2017). (F) Amino-acid residues of the KRAS variants that are nearby to the eight hotspot variants in the 2D PHATE space in (B) are shown projected onto the 3D structure. The amino-acid residues are colored according to the sensitive regions in (E)
Fig. 3.
Fig. 3.
All RAS variants assessed using our integrated scoring approach identify functional groupings among hotspot variants and demonstrate differences among non-hotspot variants. Having assessed global patterns by PHATE, we next characterized more nuanced local patterns among the variants using 2D t-SNE. Combinations of scores convey mechanistic differences in the effect of different hotspot variants. (A–C) Somatic hotspot variants of G12 (A), G13 (B) and Q61 (C) from HRAS, KRAS and NRAS are shown in the 2D t-SNE space of all the 7 RAS variants. Hotspot variants are labeled and colored by RAS protein. It is visually apparent that some non-hotspot variants are nearby hotspot variants in t-SNE space, indicating that they may have similar effects as hotspot variants, while other non-hotspot variants are far from hotspot variants in the t-SNE space, indicating that they either have no effect or a different effect from hotspot variants. (D) Heatmap plot shows patterns of scores for the somatic hotspot variants shown in the 2D t-SNE plots. The top ten scores (out of 31) are selected based on median absolute deviation (MAD³ 0.2). We separated the G12, G13 and Q61 variants in five clusters in the heatmap plot denoted as (i)–(v). Heatmap plot with all the 31 scores are shown in Supplementary Figure S6
Fig. 4.
Fig. 4.
Correlation among computational scores from multiple molecular levels for the 935 RAS variants demonstrates the distinct and underutilized value of 3D structure-based scores. (A–C) Spearman correlation (RSpearman) among pair of computational scores indicating negative (A), neutral (B) and positive correlation (C). These three examples were chosen as exemplars for relationships between the scores from different molecular levels. (D) We used total 63 individual scores for the 935 protein variants to assess the differences among variants from 7 RAS proteins (KRAS, HRAS, NRAS, MRAS, RRAS, RRAS2 and RERG) based on DNA sequence, protein sequence and 3D structure of protein. Larger and labeled versions of the correlation matrices are available in Supplementary Figure S2. We highlight two sections of structure-based scores that have nearly no overlap with one another or with information available in DNA annotations. (E) Using correlation patterns among scores, we reduced the number of individual scores to the 31 that are most unique and therefore most efficiently cover the broadest diversity of properties. The locations of same three scores specifically named in (D) are indicated by arrows. In (D) and (E) the size of each small square in the correlation matrix is proportional to the value of absolute Spearman correlation |RSpearman|. All 63 scores for 935 variants from 7 RAS genes are provided as Supporting Data (Supplementary Table S2)

References

    1. Andreoletti G. et al. (2019) Reports from the fifth edition of CAGI: the critical assessment of genome interpretation. Hum. Mutat., 40, 1197–1201. - PMC - PubMed
    1. Angeles A.K.J. et al. (2019) Phenotypic characterization of the novel, non-hotspot oncogenic KRAS mutants E31D and E63K. Oncol. Lett., 18, 420–432. - PMC - PubMed
    1. Bandaru P. et al. (2017) Deconstruction of the Ras switching cycle through saturation mutagenesis. Elife, 6, e27810. - PMC - PubMed
    1. Berliner N. et al. (2014) Combining structural modeling with ensemble machine learning to accurately predict protein fold stability and binding affinity effects upon mutation. PLoS One, 9, e107353. - PMC - PubMed
    1. Berman H.M. (2000) The Protein Data Bank. Nucleic Acids Res., 28, 235–242. - PMC - PubMed

Publication types