Functional coding haplotypes and machine-learning feature elimination identifies predictors of Methotrexate Response in Rheumatoid Arthritis patients

Affiliations

¹ Dept of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore.
² Department of Rheumatology, Allergy and Immunology, Tan Tock Seng Hospital, Singapore.
³ Dept of Pediatrics, Yong Loo Lin School of Medicine, National University of Singapore, Singapore.
⁴ Division of Human Genetics, Genome Institute of Singapore, Singapore.
⁵ Centre for Computational Biology, and Cancer and Stem Cell Biology, Duke-NUS Medical School, Singapore.
⁶ Department of Rheumatology, Allergy and Immunology, Tan Tock Seng Hospital, Singapore; Clinical Research & Innovation Office, Tan Tock Seng Hospital, Singapore. Electronic address: khai_pang_leong@ttsh.com.sg.
⁷ Dept of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Div of Cellular & Molecular Research, Humphrey Oei Institute of Cancer Research, National Cancer Centre Singapore, Singapore; Duke-NUS Medical School, Singapore; NUS Graduate School, National University of Singapore, Singapore. Electronic address: bchleec@nus.edu.sg.

PMID: 35022146
PMCID: PMC8808170
DOI: 10.1016/j.ebiom.2021.103800

Functional coding haplotypes and machine-learning feature elimination identifies predictors of Methotrexate Response in Rheumatoid Arthritis patients

Ashley J W Lim et al. EBioMedicine. 2022 Jan.

. 2022 Jan:75:103800.

doi: 10.1016/j.ebiom.2021.103800. Epub 2022 Jan 10.

Affiliations

¹ Dept of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore.
² Department of Rheumatology, Allergy and Immunology, Tan Tock Seng Hospital, Singapore.
³ Dept of Pediatrics, Yong Loo Lin School of Medicine, National University of Singapore, Singapore.
⁴ Division of Human Genetics, Genome Institute of Singapore, Singapore.
⁵ Centre for Computational Biology, and Cancer and Stem Cell Biology, Duke-NUS Medical School, Singapore.
⁶ Department of Rheumatology, Allergy and Immunology, Tan Tock Seng Hospital, Singapore; Clinical Research & Innovation Office, Tan Tock Seng Hospital, Singapore. Electronic address: khai_pang_leong@ttsh.com.sg.
⁷ Dept of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Div of Cellular & Molecular Research, Humphrey Oei Institute of Cancer Research, National Cancer Centre Singapore, Singapore; Duke-NUS Medical School, Singapore; NUS Graduate School, National University of Singapore, Singapore. Electronic address: bchleec@nus.edu.sg.

PMID: 35022146
PMCID: PMC8808170
DOI: 10.1016/j.ebiom.2021.103800

Abstract

Background: Major challenges in large scale genetic association studies include not only the identification of causative single nucleotide polymorphisms (SNPs), but also accounting for SNP-SNP interactions. This study thus proposes a novel feature engineering approach integrating potentially functional coding haplotypes (pfcHap) with machine-learning (ML) feature selection to identify biologically meaningful, possibly causative genetic factors, that take into consideration potential SNP-SNP interactions within the pfcHap, to best predict for methotrexate (MTX) response in rheumatoid arthritis (RA) patients.

Methods: Exome sequencing from 349 RA patients were analysed, of which they were split into training and unseen test set. Inferred pfcHaps were combined with 30 non-genetic features to undergo ML recursive feature elimination with cross-validation using the training set. Predictive capacity and robustness of the selected features were assessed using six popular machine learning models through a train set cross-validation and evaluated in an unseen test set.

Findings: Significantly, 100 features (95 pfcHaps, 5 non-genetic factors) were identified to have good predictive performance (AUC: 0.776-0.828; Sensitivity: 0.656-0.813; Specificity: 0.684-0.868) across all six ML models in an unseen test dataset for the prediction of MTX response in RA patients.

Interpretation: Majority of the predictive pfcHap SNPs were predicted to be potentially functional and some of the genes in which the pfcHap resides in were identified to be associated with previously reported MTX/RA pathways.

Funding: Singapore Ministry of Health's National Medical Research Council (NMRC) [NMRC/CBRG/0095/2015; CG12Aug17; CGAug16M012; NMRC/CG/017/2013]; National Cancer Center Research Fund and block funding Duke-NUS Medical School.; Singapore Ministry of Education Academic Research Fund Tier 2 grant MOE2019-T2-1-138.

Keywords: Feature selection; Genetic polymorphism; Haplotypes; Machine learning; Methotrexate; Rheumatoid Arthritis.

PubMed Disclaimer

Conflict of interest statement

Declaration of Competing Interest CGL, KPL, CCK, SSC, AJWL, and LJL declare that they have a pending Coversheet IP application.

Figures

Fig. 1 — **Figure 1**
Pipeline employed to identify the predictors of MTX response. 349 patient samples were first divided into training (n=279, 70%) and test (n=70, 30%) sets using a stratified split, such that datasets consist of the proportion of responders and non-responders that is representative of the original dataset. The training set was then further split into eight subsets consisting of different sample size ranging from 30% to 100% of the samples in the training set. Within each subset, the important features (coding haplotypes or integration of coding haplotypes with non-genetic features) were selected using recursive feature elimination with cross-validation (RFECV), applied with Random Forest Classifier as the estimator of choice. The important features that were commonly identified in all eight subsets were then shortlisted and identified as the set of important features that are predictive of MTX response. The predictive performance of these features was assessed in six different machine learning models, using cross-validation within the training set and the unseen test dataset.

Fig. 2 — **Figure 2**
Predictive performance of 120 haplotypes (from Haplotype-only analysis) in the training set using 5-fold cross-validation. ROC curves of 120 haplotypes using (a) Random Forest, (b) Logistic Regression, (c) Support Vector Machine, (d) Boosted Trees, (e) Elastic Net, and (f) Neural Network.

Fig. 3 — **Figure 3**
Number of important features (coding haplotypes and non-genetic factors) identified in eight training subsets of variable sample sizes. Columns represent the different training subsets and each row represent the features. Intensity of red represent the importance of the feature in each subset (i.e., Greater intensity represent features of greater importance and vice versa); Black represents features that are not found to be important in the respective subset.

Fig. 4 — **Figure 4**
Predictive performance of 95 haplotypes and 5 non-genetic factors in the training set using 5-fold cross-validation. ROC curves of 95 haplotypes and 5 non-genetic factors using (a) Random Forest, (b) Logistic Regression, (c) Support Vector Machine, (d) Boosted Trees, (e) Elastic Net, and (f) Neural Network.

Fig. 5 — **Figure 5**
Predictive performance of 95 haplotypes and 5 non-genetic factors in the unseen test set. ROC curves of 95 haplotypes and 5 non-genetic factors using (a) Random Forest, (b) Logistic Regression, (c) Support Vector Machine, (d) Boosted Trees, (e) Elastic Net, and (f) Neural Network.

See this image and copyright information in PMC

References

1. Relling MV., Klein TE. CPIC: Clinical pharmacogenetics implementation consortium of the pharmacogenomics research network. Clin Pharmacol Ther. 2011;89:464–467. doi: 10.1038/clpt.2010.279. - DOI - PMC - PubMed
1. Relling MV., Klein TE, Gammal RS, Whirl-Carrillo M, Hoffman JM, Caudle KE. The clinical pharmacogenetics implementation consortium: 10 years later. Clin Pharmacol Ther. 2020;107:171–175. doi: 10.1002/cpt.1651. - DOI - PMC - PubMed
1. Roden DM, Mcleod HL, Relling MV, Williams MS, Mensah GA, Peterson JF, et al. Pharmacogenomics HHS public access. Lancet. 2019;394:521–532. doi: 10.1016/S0140-6736(19)31276-0. - DOI - PMC - PubMed
1. Poldrack RA, Huckins G, Varoquaux G. Establishment of best practices for evidence for prediction: a review. JAMA Psychiatry. 2020;77:534–540. doi: 10.1001/jamapsychiatry.2019.3671. - DOI - PMC - PubMed
1. Varga TV., Niss K, Estampador AC, Collin CB, Moseley PL. Association is not prediction: A landscape of confused reporting in diabetes – A systematic review. Diabetes Res Clin Pract. 2020;170 doi: 10.1016/j.diabres.2020.108497. - DOI - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Functional coding haplotypes and machine-learning feature elimination identifies predictors of Methotrexate Response in Rheumatoid Arthritis patients

Affiliations

Functional coding haplotypes and machine-learning feature elimination identifies predictors of Methotrexate Response in Rheumatoid Arthritis patients

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Medical