Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Feb 19;10(8):8349-8360.
doi: 10.1021/acsomega.4c10179. eCollection 2025 Mar 4.

XGBMUT: Predicting the Functional Impact of Missense Mutations Using an Extreme Gradient Boost Classifier

Affiliations

XGBMUT: Predicting the Functional Impact of Missense Mutations Using an Extreme Gradient Boost Classifier

Gabriel Rodrigues Coutinho Pereira et al. ACS Omega. .

Abstract

Millions of new mutations have been discovered largely due to advancements in genome projects, but characterizing their effects through traditional wet-lab experiments remains labor-intensive and time-consuming. Functional prediction algorithms offer a solution by enabling the efficient screening of mutations, thereby saving time and resources. The objective of this study was to develop a competitive algorithm for predicting the functional impact of missense mutations. A unified database and substitution matrices containing predictor variables specifically for missense mutations were initially constructed. Subsequently, values for the predictor variables were collected from the training and test sets derived from the ClinVar and HumsaVar databases. A series of supervised machine learning classifiers were then trained, and their performance was evaluated using the test set. The best-performing model was additionally compared against ten currently available functional prediction algorithms. The proposed algorithm, XGBMut, demonstrates exceptional accuracy in classifying missense mutations while also exhibiting a competitive performance. Additionally, a user-friendly graphical interface was developed to enhance accessibility for professionals in various fields. Unlike most existing methods, XGBMut eliminates the need for a web server dependency and the installation of third-party software, making it a more versatile tool for users.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

Figure 1
Figure 1
Workflow for data processing, model training, validation, and functioning of the XGBMut classifier. (A) Workflow for XGBMut construction, validation, and benchmarking. The ClinVar and HumsaVar databases were initially processed to remove duplicates and select valid missense mutations for constructing the training and test sets, respectively. These cleaned data sets were then submitted to a function designed to automatically extract local and global predictor variables for the corresponding mutations, based on information from the previously prepared unified database and substitution matrices. The training set was used to develop the XGBMut model, while the test set, containing unseen mutations, was independently evaluated to assess the model’s performance. Finally, the performance of well-established methods for predicting the functional effects of missense mutations was compared to that of XGBMut using the test set, ensuring the model’s viability. (B) Workflow for XGBMut functioning. For unlabeled mutations, an initial viability check ensures that only valid mutations proceed to predict the variable extraction. Invalid mutations are filtered out during this step and saved in a separate file for user reference. Valid mutations were then submitted to a function designed to automatically extract local and global predictor variables for the corresponding mutations based on information from the previously prepared unified database and substitution matrices. Once predictor variables are fully extracted from the submitted data, they are input into the XGBMut model, which predicts the deleteriousness probabilities for each mutation. These predictions are then saved to an output file. Arrows represent the data flow, while icons illustrate the computational processing and model implementation steps.
Figure 2
Figure 2
Performance of XGBMut on the test set and its comparison with currently available functional prediction algorithms. (A) Confusion matrix calculated for the final model (XGBMut). TN: true negatives; TP: true positives; FN: false negatives; FP: false positives. (B) ROC curve calculated for XGBMut and ten functional prediction algorithms within the test set. The output of VariPred is not shown, as it is provided as a class label rather than a probability score, making the AUC-ROC calculation infeasible. (C) Blue bars indicate the accuracy of each algorithm, while the dashed black lines represent their corresponding coverage within the test set, i.e., the percentage of total observations (mutations) effectively analyzed by each algorithm.
Figure 3
Figure 3
Graphical user interface and command-line interface of the XGBMut software. (A) Graphical user interface. The main menu of the program is on the left, while the analysis loading menu is on the right. (B) Command-line interface. The image corresponds to the Windows version of the interfaces.

References

    1. Spencer D. H., Zhang B, Pfeifer J.. Single Nucleotide Variant Detection Using Next Generation Sequencing. Elsevier Inc.; 2014. doi:10.1016/B978-0-12-404748-8.00008-3. - DOI
    1. Dingerdissen H.; Motwani M.; Karagiannis K.; Simonyan V.; Mazumder R. Proteome-wide analysis of nonsynonymous single-nucleotide variations in active sites of human proteins. FEBS Journal. 2013, 280 (6), 1542–1562. 10.1111/febs.12155. - DOI - PubMed
    1. Kumar S U.; Kumar D T.; R S.; Doss C G. P.; Zayed H. An extensive computational approach to analyze and characterize the functional mutations in the galactose-1-phosphate uridyl transferase (GALT) protein responsible for classical galactosemia. Comput. Biol. Med. 2020, 117, 10358310.1016/j.compbiomed.2019.103583. - DOI - PubMed
    1. Kato G. J.; Piel F. B.; Reid C. D.; et al. Sickle cell disease. Nat. Rev. Dis Primers. 2018, 4 (1), 18010.10.1038/nrdp.2018.10. - DOI - PubMed
    1. Richards S.; Aziz N.; Bale S.; et al. Standards and guidelines for the interpretation of sequence variants: A joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genetics in Medicine. 2015, 17 (5), 405–424. 10.1038/gim.2015.30. - DOI - PMC - PubMed

LinkOut - more resources