Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jun 23;17 Suppl 2(Suppl 2):396.
doi: 10.1186/s12864-016-2723-1.

KinMutRF: a random forest classifier of sequence variants in the human protein kinase superfamily

Affiliations

KinMutRF: a random forest classifier of sequence variants in the human protein kinase superfamily

Tirso Pons et al. BMC Genomics. .

Abstract

Background: The association between aberrant signal processing by protein kinases and human diseases such as cancer was established long time ago. However, understanding the link between sequence variants in the protein kinase superfamily and the mechanistic complex traits at the molecular level remains challenging: cells tolerate most genomic alterations and only a minor fraction disrupt molecular function sufficiently and drive disease.

Results: KinMutRF is a novel random-forest method to automatically identify pathogenic variants in human kinases. Twenty six decision trees implemented as a random forest ponder a battery of features that characterize the variants: a) at the gene level, including membership to a Kinbase group and Gene Ontology terms; b) at the PFAM domain level; and c) at the residue level, the types of amino acids involved, changes in biochemical properties, functional annotations from UniProt, Phospho.ELM and FireDB. KinMutRF identifies disease-associated variants satisfactorily (Acc: 0.88, Prec:0.82, Rec:0.75, F-score:0.78, MCC:0.68) when trained and cross-validated with the 3689 human kinase variants from UniProt that have been annotated as neutral or pathogenic. All unclassified variants were excluded from the training set. Furthermore, KinMutRF is discussed with respect to two independent kinase-specific sets of mutations no included in the training and testing, Kin-Driver (643 variants) and Pon-BTK (1495 variants). Moreover, we provide predictions for the 848 protein kinase variants in UniProt that remained unclassified. A public implementation of KinMutRF, including documentation and examples, is available online ( http://kinmut2.bioinfo.cnio.es ). The source code for local installation is released under a GPL version 3 license, and can be downloaded from https://github.com/Rbbt-Workflows/KinMut2 .

Conclusions: KinMutRF is capable of classifying kinase variation with good performance. Predictions by KinMutRF compare favorably in a benchmark with other state-of-the-art methods (i.e. SIFT, Polyphen-2, MutationAssesor, MutationTaster, LRT, CADD, FATHMM, and VEST). Kinase-specific features rank as the most elucidatory in terms of information gain and are likely the improvement in prediction performance. This advocates for the development of family-specific classifiers able to exploit the discriminatory power of features unique to individual protein families.

Keywords: Functional impact; Pathogenicity prediction; Protein kinases; Variant prioritization; X-linked agammaglobulinemia.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Performance and classification features. a Performance of the classifier respect to the number of trees in the random forest; b idem, close-up on the region around the performance values; c Number of variants in each kinase group; d log odds-ratio of the number of variants in each kinase group; e Number of variants in each kinase domain; f log odds-ratio of the number of variants in each kinase domain; g changes in Cbeta-branching caused by pathogenic and neutral variants; h number of pathogenic and neutral variants affecting catalytic sites as defined by UniProt, FireDB and Phospho.ELM. i Distribution of SIFT scores; j Changes in volume caused by disease-associated and neutral variants; k Changes in hydrophobicity caused by disease-associated and neutral variants; l Accumulated Gene Ontology (GO) log odds-ratio. Note that, where relevant, disease-associated variants were represented in dark red whereas ochre was used for their neutral counterparts
Fig. 2
Fig. 2
Prediction of pathogenicity for variants uncharacterised in UniProt. a Distribution of predictions of pathogenicity in the different protein kinases; b Fraction of predictions as disease-associated and neutral; c Distribution of predictions of pathogenicity in the different groups in the taxonomy of protein kinases; d Distribution of predictions of pathogenicity respect to PFAM domains; e Distribution of the PFAM domain log odds-ratios for neutral and disease-associated variants; f Distribution of the accummulated Gene Ontology log odds-ratios (sumGOlor) for neutral and disease-associated variants

Similar articles

Cited by

References

    1. Mardis ER. A decade's perspective on DNA sequencing technology. Nature. 2011;470:198–203. doi: 10.1038/nature09796. - DOI - PubMed
    1. Greenman C, Stephens P, Smith R, Dalgliesh GL, Hunter C, Bignell G, Davies H, Teague J, Butler A, Stevens C, Edkins S, O'Meara S, Vastrik I, Schmidt EE, Avis T, Barthorpe S, Bhamra G, Buck G, Choudhury B, Clements J, Cole J, Dicks E, Forbes S, Gray K, Halliday K, Harrison R, Hills K, Hinton J, Jenkinson A, Jones D, et al. Patterns of somatic mutation in human cancer genomes. Nature. 2007;446:153–158. doi: 10.1038/nature05610. - DOI - PMC - PubMed
    1. Sjöblom T, Jones S, Wood LD, Parsons DW, Lin J, Barber TD, Mandelker D, Leary RJ, Ptak J, Silliman N, Szabo S, Buckhaults P, Farrell C, Meeh P, Markowitz SD, Willis J, Dawson D, Willson JKV, Gazdar AF, Hartigan J, Wu L, Liu C, Parmigiani G, Park BH, Bachman KE, Papadopoulos N, Vogelstein B, Kinzler KW, Velculescu VE. The consensus coding sequences of human breast and colorectal cancers. Science. 2006;314:268–274. doi: 10.1126/science.1133427. - DOI - PubMed
    1. Wood LD, Parsons DW, Jones S, Lin J, Sjöblom T, Leary RJ, Shen D, Boca SM, Barber T, Ptak J, Silliman N, Szabo S, Dezso Z, Ustyanksky V, Nikolskaya T, Nikolsky Y, Karchin R, Wilson PA, Kaminker JS, Zhang Z, Croshaw R, Willis J, Dawson D, Shipitsin M, Willson JKV, Sukumar S, Polyak K, Park BH, Pethiyagoda CL, Pant PVK, et al. The genomic landscapes of human breast and colorectal cancers. Science. 2007;318:1108–1113. doi: 10.1126/science.1145720. - DOI - PubMed
    1. Creixell P, Schoof EM, Simpson CD, Longden J, Miller CJ, Lou HJ, Perryman L, Cox TR, Zivanovic N, Palmeri A, Wesolowska-Andersen A, Helmer-Citterich M, Ferkinghoff-Borg J, Itamochi H, Bodenmiller B, Erler JT, Turk BE, Linding R. Kinome-wide Decoding of Network-Attacking Mutations Rewiring Cancer Signaling. Cell. 2015;163:202–217. doi: 10.1016/j.cell.2015.08.056. - DOI - PMC - PubMed

Publication types

Substances

LinkOut - more resources