Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jun 23:2:912112.
doi: 10.3389/fbinf.2022.912112. eCollection 2022.

A Machine Learning Framework Predicts the Clinical Severity of Hemophilia B Caused by Point-Mutations

Affiliations

A Machine Learning Framework Predicts the Clinical Severity of Hemophilia B Caused by Point-Mutations

Tiago J S Lopes et al. Front Bioinform. .

Abstract

Blood coagulation is a vital physiological mechanism to stop blood loss following an injury to a blood vessel. This process starts immediately upon damage to the endothelium lining a blood vessel, and results in the formation of a platelet plug that closes the site of injury. In this repair operation, an essential component is the coagulation factor IX (FIX), a serine protease encoded by the F9 gene and whose deficiency causes hemophilia B. If not treated by prophylaxis or gene therapy, patients with this condition are at risk of life-threatening bleeding episodes. In this sense, a deep understanding of the FIX protein and its activated form (FIXa) is essential to develop efficient therapeutics. In this study, we used well-studied structural analysis techniques to create a residue interaction network of the FIXa protein. Here, the nodes are the amino acids of FIXa, and two nodes are connected by an edge if the two residues are in close proximity in the FIXa 3D structure. This representation accurately captured fundamental properties of each amino acid of the FIXa structure, as we found by validating our findings against hundreds of clinical reports about the severity of HB. Finally, we established a machine learning framework named HemB-Class to predict the effect of mutations of all FIXa residues to all other amino acids and used it to disambiguate several conflicting medical reports. Together, these methods provide a comprehensive map of the FIXa protein architecture and establish a robust platform for the rational design of FIX therapeutics.

Keywords: FIX; FIXa; bioinformatics; hemophilia B; machine learning; protein structure; residue network.

PubMed Disclaimer

Conflict of interest statement

TL received consulting fees from Pola Chemical Industries, Japan, and speaker honoraria from Sanofi Japan. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
Representation of FIXa structure as a residue network. (A) In the FIXa RIN, each node represents an amino acid, and two nodes are connected by an edge if their atoms are in close proximity (∼5 Å). (B) The degree quantifies the number of connections a residue has, the betweenness indicates how many times a node served as a bridge on the shortest path along two other amino acids, and the Burt’s constraint was derived from social science studies to quantify the position of advantage of individuals within an organization (Burt, 2009). Nodes with high-degree participate in multiple molecular interactions, and those with high-betweenness and low Burt’s constraint serve as intermediate between different groups of amino acids. In contrast, residues with low degree, low betweenness and high Burt’s constraint usually do not have many connections to other residues and are located at the periphery of the network. (C) Properties derived from the FIXa structure or from the RIN are good indicators of the severity of HB. Depicted is the solvent-accessible (areaSAS) and the solvent-excluded (areaSES) surface areas, the relative exposure of amino acids (Rel. Exposure Area), the conservation of the FIXa residues (smaller values indicate higher conservation), and the RIN centrality measures. Also depicted are measures derived from SIFT (Sim et al., 2012), Provean (2 Scores, −2.5 and 0.05) (Choi and Chan, 2015), and from Polyphen-2 (PPH2-Prob, dScore, Score 1, Score 2, MinDJxn, IdPmax, IdQmin) (Adzhubei et al., 2010). The boxplots show the median (center line), the first and third quartiles (lower- and upper-bounds), and 1.5 times the inter-quartile range (lower- and upper whiskers). Each dot in the plot is an amino acid mutation (i.e., a clinical case report). Unpaired, two-sided Wilcoxon test (***p-values < 0.001; **p-value < 0.01; *p-value < 0.05).
FIGURE 2
FIGURE 2
Centrality measures from the FIXa RIN and the important residues. (A) The Spearman correlation between all measures considered in this study. (B) The degree and betweenness of all residues of the FIXa RIN. Each dot represents an amino acid, and groups of residues with different characteristics are highlighted. (C) The location of the residues highlighted in panel (B) in the FIXa protein structure. (D) The boxplot displays the betweenness and the Burt’s constraint values of the nodes taking part in atomic interactions with residues in other domains of FIXa (link nodes), compared to the nodes interacting only with residues from the same domain. (E) The residues with the highest degree, betweenness and (lowest) Burt’s constraint; these residues are most likely the most central of the whole FIXa protein. The boxplots show the median (center line), the first and third quartiles (lower- and upper-bounds), and 1.5 times the inter-quartile range (lower- and upper whiskers). Unpaired, two-sided Wilcoxon test (***p-values < 0.001; *p-value < 0.01; *p-value < 0.05).
FIGURE 3
FIGURE 3
The HemB-Class machine learning framework. (A) Our machine learning classifiers received as input the properties from the FIXa structure, from the FIXa RIN, the conservation score of each amino acid and measures derived from other variant prediction algorithms [SIFT (Sim et al., 2012), Provean (Choi and Chan, 2015), and Polyphen-2 (Adzhubei et al., 2010)]. The output of our classifiers is the severity of HB, derived from clinical reports from the EAHAD FIX mutation database (Rallapalli et al., 2013). (B) Comparative performance of six classifiers and a combination of the best classifiers (Ensemble—we named it HemB-Class). The bars depict the mean values of 10 repetitions of 10-fold cross validations and the error bars are the standard deviation values. (C) Spearman correlation of the predicted probabilities outputted by the classifiers. (D) The trade-off between the number of instances classified and the accuracy. Each dot is the classification performance of an individual classifiers or the ensembles when we vary the classification threshold to create an “exclusion area” to disregard instances with ambiguous classifications.
FIGURE 4
FIGURE 4
Severity Score of all possible FIXa mutations. (A) The Severity Score of mutations not used during the training phase because they had conflicting symptoms reported in the medical literature. Our predictions agreed with the majority class of each mutation (Supplementary Table S5). (B) The Severity Score predicted by the HemB-Class framework for the mutations of each FIXa residue to the 19 remaining amino acids. (C) The location of the residues with the highest Severity Scores. These residues, located at the core of each FIXa domain, are unlikely to accept any amino substitution (Supplementary Table S6). (D) The most buried residues (less than 25% relative surface exposure), have significantly higher Severity Scores than the most exposed residues. The boxplots show the median (center line), the first and third quartiles (lower- and upper-bounds), and 1.5 times the inter-quartile range (lower- and upper whiskers). Each dot in the plot is an amino acid mutation (i.e., a clinical case report). Unpaired, two-sided Wilcoxon test (***p-values < 0.001; **p-value < 0.01; *p-value < 0.05).

References

    1. Adzhubei I. A., Schmidt S., Peshkin L., Ramensky V. E., Gerasimova A., Bork P., et al. (2010). A Method and Server for Predicting Damaging Missense Mutations. Nat. Methods 7 (4), 248–249. 10.1038/nmeth0410-248 - DOI - PMC - PubMed
    1. Anson D. S., Choo K. H., Rees D. J., Giannelli F., Gould K., Huddleston J. A., et al. (1984). The Gene Structure of Human Anti-Haemophilic Factor IX. EMBO J. 3 (5), 1053–1060. 10.1002/j.1460-2075.1984.tb01926.x - DOI - PMC - PubMed
    1. Bajaj S. P., Rapaport S. I., Russell W. A. (1983). Redetermination of the Rate-Limiting Step in the Activation of Factor IX by Factor XIa and by Factor VIIa/Tissue Factor. Explanation for Different Electrophoretic Radioactivity Profiles Obtained on Activation of 3H- and 125I-Labeled Factor IX. Biochemistry 22 (17), 4047–4053. 10.1021/bi00286a009 - DOI - PubMed
    1. Bajaj S. P., Schmidt A. E., Mathur A., Padmanabhan K., Zhong D., Mastri M., et al. (2001). Factor IXa:factor VIIIa Interaction. Helix 330-338 of Factor Ixa Interacts with Residues 558-565 and Spatially Adjacent Regions of the A2 Subunit of Factor VIIIa. J. Biol. Chem. 276 (19), 16302–16309. 10.1074/jbc.M011680200 - DOI - PubMed
    1. Ben Chorin A., Masrati G., Kessel A., Narunsky A., Sprinzak J., Lahav S., et al. (2020). ConSurf-DB: An Accessible Repository for the Evolutionary Conservation Patterns of the Majority of PDB Proteins. Protein Sci. 29 (1), 258–267. 10.1002/pro.3779 - DOI - PMC - PubMed