Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Mar 22;19(3):e0300717.
doi: 10.1371/journal.pone.0300717. eCollection 2024.

Machine learning-based prediction of rheumatoid arthritis with development of ACPA autoantibodies in the presence of non-HLA genes polymorphisms

Affiliations

Machine learning-based prediction of rheumatoid arthritis with development of ACPA autoantibodies in the presence of non-HLA genes polymorphisms

Grzegorz Dudek et al. PLoS One. .

Abstract

Machine learning (ML) algorithms can handle complex genomic data and identify predictive patterns that may not be apparent through traditional statistical methods. They become popular tools for medical applications including prediction, diagnosis or treatment of complex diseases like rheumatoid arthritis (RA). RA is an autoimmune disease in which genetic factors play a major role. Among the most important genetic factors predisposing to the development of this disease and serving as genetic markers are HLA-DRB and non-HLA genes single nucleotide polymorphisms (SNPs). Another marker of RA is the presence of anticitrullinated peptide antibodies (ACPA) which is correlated with severity of RA. We use genetic data of SNPs in four non-HLA genes (PTPN22, STAT4, TRAF1, CD40 and PADI4) to predict the occurrence of ACPA positive RA in the Polish population. This work is a comprehensive comparative analysis, wherein we assess and juxtapose various ML classifiers. Our evaluation encompasses a range of models, including logistic regression, k-nearest neighbors, naïve Bayes, decision tree, boosted trees, multilayer perceptron, and support vector machines. The top-performing models demonstrated closely matched levels of accuracy, each distinguished by its particular strengths. Among these, we highly recommend the use of a decision tree as the foremost choice, given its exceptional performance and interpretability. The sensitivity and specificity of the ML models is about 70% that are satisfying. In addition, we introduce a novel feature importance estimation method characterized by its transparent interpretability and global optimality. This method allows us to thoroughly explore all conceivable combinations of polymorphisms, enabling us to pinpoint those possessing the highest predictive power. Taken together, these findings suggest that non-HLA SNPs allow to determine the group of individuals more prone to develop RA rheumatoid arthritis and further implement more precise preventive approach.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Association of RA with the frequency of SNPs in PTPN22 (rs2476601), PADI4 (rs2240340), TRAF1 (rs3761847), STAT4 (rs7574865), and CD40 (rs4810485) genes.
Fig 2
Fig 2. DT model.
Fig 3
Fig 3. Accuracy of the models for different combinations of input features.
Fig 4
Fig 4. Confusion matrix for NB with features v1v3v4v5.
Fig 5
Fig 5. Feature importance.

Similar articles

Cited by

References

    1. Klareskog L., Stolt P., Lundberg K., Källberg H., Bengtsson C., Grunewald J. et al.. A new model for an etiology of rheumatoid arthritis: smoking may trigger HLA-DR (shared epitope)-restricted immune reactions to autoantigens modified by citrullination. Arthritis Rheumatology 2006, 54(1), 38–46. doi: 10.1002/art.21575 - DOI - PubMed
    1. Syversen S.W., Gaarder P.I., Goll G.L., Ødegård S., Haavardsholm E.A., Mowinckel P., et al.. A new model for an etiology of rheumatoid arthritis: smoking may trigger HLA-DR (shared epitope)-restricted immune reactions to autoantigens modified by citrullination. Arthritis Rheumatology 2006, 54(1), 38–46. doi: 10.1002/art.21575 - DOI - PubMed
    1. Raychaudhuri S., Sandor C., Stahl E.A., Freudenberg J., Lee H.S., Jia X., et al.. Five amino acids in three hla proteins explain most of the association between MHC and seropositive rheumatoid arthritis. Nature Genetics, 44(3), 291–6, (2012). doi: 10.1038/ng.1076 - DOI - PMC - PubMed
    1. Stahl E.A., Raychaudhuri S., Remmers E.F., Xie G., Eyre S., Thomson B.P., et al.. Genome-wide association study meta-analysis identifies seven new rheumatoid arthritis risk loci. Nature Genetics 2010, 42(6), 508–514. doi: 10.1038/ng.582 - DOI - PMC - PubMed
    1. Plenge R.M., Padyukov L., Remmers E.F., Purcell S., Lee A.T., Karlson E.W., et al.. Replication of putative candidate-gene associations with rheumatoid arthritis in >4,000 samples from North America and Sweden: association of susceptibility with PTPN22, CTLA4, and PADI4. Am J Hum Genet. 2005, 77(6), 1044–60. doi: 10.1086/498651 - DOI - PMC - PubMed

MeSH terms

Substances