Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Feb 16;3(2):e26.
doi: 10.1371/journal.pcbi.0030026. Epub 2006 Dec 28.

Functional impact of missense variants in BRCA1 predicted by supervised learning

Affiliations

Functional impact of missense variants in BRCA1 predicted by supervised learning

Rachel Karchin et al. PLoS Comput Biol. .

Abstract

Many individuals tested for inherited cancer susceptibility at the BRCA1 gene locus are discovered to have variants of unknown clinical significance (UCVs). Most UCVs cause a single amino acid residue (missense) change in the BRCA1 protein. They can be biochemically assayed, but such evaluations are time-consuming and labor-intensive. Computational methods that classify and suggest explanations for UCV impact on protein function can complement functional tests. Here we describe a supervised learning approach to classification of BRCA1 UCVs. Using a novel combination of 16 predictive features, the algorithms were applied to retrospectively classify the impact of 36 BRCA1 C-terminal (BRCT) domain UCVs biochemically assayed to measure transactivation function and to blindly classify 54 documented UCVs. Majority vote of three supervised learning algorithms is in agreement with the assay for more than 94% of the UCVs. Two UCVs found deleterious by both the assay and the classifiers reveal a previously uncharacterized putative binding site. Clinicians may soon be able to use computational classifiers such as those described here to better inform patients. These classifiers can be adapted to other cancer susceptibility genes and systematically applied to prioritize the growing number of potential causative loci and variants found by large-scale disease association studies.

PubMed Disclaimer

Conflict of interest statement

Competing interests. The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Computational Classifications of 36 BRCA1 BRCT Variants Functionally Characterized by the Transactivation Assay
For each variant, the local protein structure environment is represented by secondary structure type and whether the amino acid residue is buried (normalized solvent accessibility < 0.2) or exposed (normalized solvent accessibility ≥ 0.2). Labels (“1655 S->F”) are colored according to whether the variant was functional in the assays (blue) or nonfunctional (red). Computational classifications in agreement with the assay are indicated by filled circles. Computational classifications not in agreement with the assay are indicated by outlined circles. Computational classifications yielding “unclassified” are indicated by an outlined black circle. The variant D1692N is fully functional as a transcriptional activator but results in incorrect splicing in vivo. Results from variant M1775K are unpublished (Foulkes et al.). A, Ancestral Sequence; B, Rule-based decision tree; D, Decision Tree; F, SIFT; MCC, Matthews correlation coefficient; N, Naïve Bayes; R, Random Forest; S, Support Vector Machine; T, Align-GVGD Tnig; U, Align-GVGD Spur.
Figure 2
Figure 2. Sensitivity versus 1-Specificity of Classifiers That Use a Numerical Score to Predict the Functional Impact of 34 BRCA1 BRCT UCVs
Comparison of four supervised machine learning methods, trained on 618 biochemically characterized missense variants in the human transcription factor TP53 and two sequence analysis methods that consider evolutionary conservation and physiochemical properties of amino acids (SIFT and Align-GVGD Tnig based on alignment of eight placental mammals, marsupial, chicken, frog, and pufferfish). Align-GVGD Spur, using an alignment that includes these species plus sea urchin, performs slightly worse than Align-GVGD Tnig in terms of ROC analysis and is not shown. Plot created with ROCR [86]. DT, decision tree; NB, Naïve Bayes; RF, random forest; SVM, support vector machine.
Figure 3
Figure 3. Computational Classifications of 54 Uncharacterized Variants Found in BIC
For each variant, the local protein structure environment is represented by secondary structure type and whether the amino acid residue is buried (normalized solvent accessibility < 0.2) or exposed (normalized solvent accessibility ≥ 0.2). For the 54 uncharacterized variants, labels (“1652 M->T”) are colored according to consensus prediction from Naïve Bayes, Support Vector Machine, and Random Forest. Predictions of each method are indicated by filled circles (blue, neutral; red, deleterious). N, Naïve Bayes. R, Random Forest; S, Support Vector Machine.
Figure 4
Figure 4. Spatial Distribution of Predicted Deleterious and Neutral Missense Variants in the BRCA1 BRCT Domains
(A) Ribbon representation of the two domains with labeled helices (α1, α2, etc.) and strands (β1, β2, etc.). Recreation of Figure 1A [64]. (B) BRCA1 BRCT missense variants reported as neutral (blue) and deleterious (red) in the mammalian transactivation assay shown mapped onto the BRCA1 BRCT X-ray crystal structure (1t29). (C) Consensus predictions of Random Forest, Naïve Bayes, and Support Vector Machine for 54 BRCA1 BRCT VUS in the Breast Information Core database (http://research.nhgri.nih.gov/bic/BIC/) mapped onto the same structure, with predicted neutral shown in blue and predicted deleterious in red.
Figure 5
Figure 5. Identification of a Putative Novel Binding Site in BRCA1 BRCT Domains
Two surface variants found to be deleterious to BRCA1 activity in our companion paper (R1753T and T1685I) [14] lie at a highly conserved patch of amino acid residues, forming a groove on the protein surface, possibly a heretofore uncharacterized binding site of BRCA1 with a protein partner or nucleotide ligand. (A) Surface representation of BRCA1 BRCT domains colored by conservation in our multiple sequence alignment of orthologs. Red, 100% conserved; white, 39% conserved; blue, 0% conserved. (B) Two hydrogen-bonding networks are shown in ball-and-stick format. (C) Changes in the electrostatic surface potential of the putative binding site upon mutation of R1753 to T1753. The electrostatic surface potential of the groove changes from primarily positive (greater than 10 kT) and neutral (0 kT), depicted as blue and white, to negative (less than −10 kT), depicted as red. This change may weaken the binding of protein partner(s) or nucleic acid ligand(s) necessary for BRCA1′s transactivation activity. Electrostatic surface potential calculated by DELPHI, visualized by CHIMERA in GRASP format [39,40,43]. (D) Multiple sequence alignment of BRCT domains in BRCA1 orthologs. Primary groove residues are shaded in black, and their hydrogen-bonding partners are shaded in gray.

Similar articles

Cited by

References

    1. Gudmundsdottir K, Ashworth A. The roles of BRCA1 and BRCA2 and associated proteins in the maintenance of genomic stability. Oncogene. 2006;25:5864–5874. - PubMed
    1. Starita LM, Parvin JD. Substrates of the BRCA1-dependent ubiquitin ligase. Cancer Biol Ther. 2006;5:137–141. - PubMed
    1. Venkitaraman AR. Cancer susceptibility and the functions of BRCA1 and BRCA2. Cell. 2002;108:171–182. - PubMed
    1. Nathanson KN, Wooster R, Weber BL. Breast cancer genetics: What we know and what we need. Nat Med. 2001;7:552–556. - PubMed
    1. Szabo CI, Worley T, Monteiro AN. Understanding germ-line mutations in BRCA1. Cancer Biol Ther. 2004;3:515–520. - PubMed

Publication types