Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 May 10:3:1152039.
doi: 10.3389/fbinf.2023.1152039. eCollection 2023.

A graph-based machine learning framework identifies critical properties of FVIII that lead to hemophilia A

Affiliations

A graph-based machine learning framework identifies critical properties of FVIII that lead to hemophilia A

Marcos V Ferreira et al. Front Bioinform. .

Abstract

Introduction: Blood coagulation is an essential process to cease bleeding in humans and other species. This mechanism is characterized by a molecular cascade of more than a dozen components activated after an injury to a blood vessel. In this process, the coagulation factor VIII (FVIII) is a master regulator, enhancing the activity of other components by thousands of times. In this sense, it is unsurprising that even single amino acid substitutions result in hemophilia A (HA)-a disease marked by uncontrolled bleeding and that leaves patients at permanent risk of hemorrhagic complications. Methods: Despite recent advances in the diagnosis and treatment of HA, the precise role of each residue of the FVIII protein remains unclear. In this study, we developed a graph-based machine learning framework that explores in detail the network formed by the residues of the FVIII protein, where each residue is a node, and two nodes are connected if they are in close proximity on the FVIII 3D structure. Results: Using this system, we identified the properties that lead to severe and mild forms of the disease. Finally, in an effort to advance the development of novel recombinant therapeutic FVIII proteins, we adapted our framework to predict the activity and expression of more than 300 in vitro alanine mutations, once more observing a close agreement between the in silico and the in vitro results. Discussion: Together, the results derived from this study demonstrate how graph-based classifiers can leverage the diagnostic and treatment of a rare disease.

Keywords: FVII; FVIIIa; bioinformatics; graph neural network; machine learning; protein structure; residue network.

PubMed Disclaimer

Conflict of interest statement

TL received consulting fees from Pola Chemical Industries, Yokohama, Japan for projects unrelated to the current study, and speaker honoraria from Sanofi Japan. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
Design of the GNN-HemA. (A) From the pre-processed FVIII structure, we generated a residue network, obtained structural measures like solvent accessible area as well as a conservation score for each residue. This served as input for GNN classifiers, that were trained to predict the severity of 626 patients with HA, as well as the coagulation activity of more than 300 alanine mutant FVIII constructs Pellequer et al. (2011); Plantier et al. (2012). (B) In detail, the GNN algorithms’ training process starts by extracting sub-graphs from the residue network obtained from pre-processed the FVIII-RIN. Next, the sub-graphs are used to train a Graph Attention Network (GAT) with four attention heads. After computing the attention scores, GAT utilizes a Multilayer Perceptron (MLP) to classify the graph nodes according to the severity of hemophilia A or the coagulation activity of the FVIII alanine mutants.
FIGURE 2
FIGURE 2
Predicting the severity of HA. (A) After careful data sanitation, our dataset had 626 unique cases of HA (Supplementary Table S2), caused by single-point, non-synonymous mutations. We merged the mild and moderate cases into a single class, reducing the problem to a 2-class classification. (B–C) Mutations at residues buried at the core of FVIII (i.e., low solvent accessible area), and conserved during evolution (i.e., low conservation score) result in severe HA, most likely due to the disruption the FVIII protein conformation. (D–E) Comparing different classifiers’ architectures, we obtained a classification accuracy of 0.69, and F1 value of 0.44, highlighting the difficulty associated to predicting the severity of HA from clinical data, but still useful to anticipate the effects of single-point mutations.
FIGURE 3
FIGURE 3
Predicting the reduction of coagulation activity in alanine mutants. (A) We considered 344 alanine mutations to the A2 and the C2 domains of FVIII. We divided these mutations into two groups, namely, those that retained at least 50% of the coagulation activity of the WT, and those below this threshold, measured by a chromogenic assay (Pellequer et al., 2011; Plantier et al., 2012) (Supplementary Table S3). (B–C) As it happens with clinical cases, the targeted mutations at the core hydrophobic residues and to those that are highly conserved, impair the co-factor activity of FVIII (Lopes et al., 2021b). (D–E) The GAT 3 HOP architecture presented the best predictive power, with an accuracy of 0.7 and F1 value of 0.61, indicating that this GNN model can be used to simulate in silico the effect of targeted alanine mutations to FVIII.
FIGURE 4
FIGURE 4
Predicting the reduction of coagulation activity in alanine mutants. (A) We considered 333 targeted alanine mutations to the A2 and C2 domains of FVIII. We divided these mutations into two groups, namely, those that retained at least 50% of the coagulation activity of the WT, and those below this threshold, measured by an ELISA (antigen) assay (Pellequer et al., 2011; Plantier et al., 2012) (Supplementary Table S4). (B–C) As expected, substitutions of the residues located at the core of these domains, as well as the most conserved ones, result in poor rescue of recombinant proteins by ELISA, suggesting that these mutations affected to a higher extent the correct folding and expression of FVIII. (D–E) The classification evaluation emphasizes GAT 5 PPR and GAT 3 PPR presented the best accuracy and F1 results, respectively.

References

    1. Adzhubei I., Jordan D. M., Sunyaev S. R. (2013). Predicting functional effect of human missense mutations using polyphen-2. Curr. Protoc. Hum. Genet. 76, Unit7.20. 10.1002/0471142905.hg0720s76 - DOI - PMC - PubMed
    1. Akdel M., Pires D. E., Pardo E. P., Jänes J., Zalevsky A. O., Mészáros B., et al. (2022). A structural biology community assessment of alphafold2 applications. Nat. Struct. Mol. Biol. 29, 1056–1067. 10.1038/s41594-022-00849-w - DOI - PMC - PubMed
    1. Amitai G., Shemesh A., Sitbon E., Shklar M., Netanely D., Venger I., et al. (2004). Network analysis of protein structures identifies functional residues. J. Mol. Biol. 344, 1135–1146. 10.1016/j.jmb.2004.10.055 - DOI - PubMed
    1. Ben Chorin A., Masrati G., Kessel A., Narunsky A., Sprinzak J., Lahav S., et al. (2020). Consurf-db: An accessible repository for the evolutionary conservation patterns of the majority of pdb proteins. Protein Sci. 29, 258–267. 10.1002/pro.3779 - DOI - PMC - PubMed
    1. Bendell C. J., Liu S., Aumentado-Armstrong T., Istrate B., Cernek P. T., Khan S., et al. (2014). Transient protein-protein interface prediction: Datasets, features, algorithms, and the rad-t predictor. BMC Bioinforma. 15, 82–12. 10.1186/1471-2105-15-82 - DOI - PMC - PubMed

LinkOut - more resources