Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jun;80(6):758-766.
doi: 10.1136/annrheumdis-2020-219069. Epub 2021 Feb 10.

Lupus or not? SLE Risk Probability Index (SLERPI): a simple, clinician-friendly machine learning-based model to assist the diagnosis of systemic lupus erythematosus

Affiliations

Lupus or not? SLE Risk Probability Index (SLERPI): a simple, clinician-friendly machine learning-based model to assist the diagnosis of systemic lupus erythematosus

Christina Adamichou et al. Ann Rheum Dis. 2021 Jun.

Abstract

Objectives: Diagnostic reasoning in systemic lupus erythematosus (SLE) is a complex process reflecting the probability of disease at a given timepoint against competing diagnoses. We applied machine learning in well-characterised patient data sets to develop an algorithm that can aid SLE diagnosis.

Methods: From a discovery cohort of randomly selected 802 adults with SLE or control rheumatologic diseases, clinically selected panels of deconvoluted classification criteria and non-criteria features were analysed. Feature selection and model construction were done with Random Forests and Least Absolute Shrinkage and Selection Operator-logistic regression (LASSO-LR). The best model in 10-fold cross-validation was tested in a validation cohort (512 SLE, 143 disease controls).

Results: A novel LASSO-LR model had the best performance and included 14 variably weighed features with thrombocytopenia/haemolytic anaemia, malar/maculopapular rash, proteinuria, low C3 and C4, antinuclear antibodies (ANA) and immunologic disorder being the strongest SLE predictors. Our model produced SLE risk probabilities (depending on the combination of features) correlating positively with disease severity and organ damage, and allowing the unbiased classification of a validation cohort into diagnostic certainty levels (unlikely, possible, likely, definitive SLE) based on the likelihood of SLE against other diagnoses. Operating the model as binary (lupus/not-lupus), we noted excellent accuracy (94.8%) for identifying SLE, and high sensitivity for early disease (93.8%), nephritis (97.9%), neuropsychiatric (91.8%) and severe lupus requiring immunosuppressives/biologics (96.4%). This was converted into a scoring system, whereby a score >7 has 94.2% accuracy.

Conclusions: We have developed and validated an accurate, clinician-friendly algorithm based on classical disease features for early SLE diagnosis and treatment to improve patient outcomes.

Keywords: autoantibodies; autoimmune diseases; lupus erythematosus; systemic.

PubMed Disclaimer

Conflict of interest statement

Competing interests: None declared.

Figures

Figure 1
Figure 1
Schematic overview of the methodology for developing a machine learning-based diagnostic model for SLE. We used a discovery cohort of randomly selected 802 adults with SLE or control rheumatologic diseases (1:1 ratio) to prepare 20 clinically selected panels of classification criteria items (both in their original version and deconvoluted into subitems in the case of composite items) and non-criteria features. Two machine learning methods were applied for feature selection and model construction for each panel: (A) Random Forests (RF) and (B) Least Absolute Shrinkage and Selection Operator (LASSO) followed by logistic regression (LASSO-LR). The best model (highest accuracy the in 10-fold cross-validation process) was further tested in an independent dataset of 512 patients with systemic lupus erythematosus (SLE) and 143 disease controls (validation cohort). AUC, area under the curve; CV, cross-validation; ROC, receiver operating curve.
Figure 2
Figure 2
A Least Absolute Shrinkage and Selection Operator-logistic regression (LASSO-LR) model shows high discriminating capacity for SLE against competing rheumatological diseases. (A) A LASSO-LR model comprising of 14 clinical and serological features showed the highest accuracy for SLE in the 10-fold cross-validation runs from the discovery cohort. The plot illustrates the features associated with increased likelihood for SLE as compared with control rheumatological diseases along with the corresponding effect sizes (OR; 95% CI, p value). All model parameters are treated as dichotomous (ie, present=1, absent=0) in the LR equation as follows: F(x)=Intercept + (1.80×mucosal ulcers 1) + (2.96×synovitis 1) + (1.83×serositis 1) + (3.66×immunologic disorder 2) + (4.42×antinuclear antibodies (ANA)3) + (2.13×alopecia 4) + (2.17×neurologic disorder 4) + (4.25×malar and/or maculopapular rash 3) + (2.58×subacute cutaneous lupus erythematosus (SCLE) and/or discoid lupus erythematosus (DLE)3) + (1.82×leucopenia 3) + (6.46×thrombocytopenia and/or autoimmune haemolytic anaemia (AIHA)3) + (6.63×low C3 and C4 3) – (1.45×interstitial lung disease (ILD) 5); 1defined according to the ACR 1997 classification criteria, 2defined according to the ACR 1997 criteria modified to include also positive anti-β2 glycoprotein IgG or IgM antibodies, 3defined according to the EULAR/ACR 2019 classification criteria, 4defined according to the SLICC 2012 classification criteria, 5see online supplemental table S2) for definition. (B) The LASSO-LR model presented in (A) was further evaluated in an external (validation) cohort of patients with 512 patients with SLE and 143 disease controls. The graph represents the receiver operating curve with a calculated area under the curve of 0.981 indicating an excellent capacity of the model to discriminate SLE versus disease controls.
Figure 3
Figure 3
The Least Absolute Shrinkage and Selection Operator-logistic regression (LASSO-LR) model can generate SLE risk probabilities, which correspond to distinct diagnostic certainty levels and correlate with disease outcomes. (A) Bar plot representation of the fraction of patients with SLE patients and disease controls (validation cohort) according to increasing bins of predicted SLE risk probabilities (0%–14%, 15%–43%, 44%–86%, 87%–100%) calculated by the LASSO-LR model shown in figure 2. Superimposed are the diagnostic accuracies (blue-coloured) corresponding to the rates of correct classification of disease controls against patients with SLE in the lower two probability bins (0%–14%, 15%–43%), and of patients with SLE against disease controls in the higher two probability bins (44%–86%, 87%–100%). Results are averages (±SD) for patient fractions or 95% CI for the accuracy metric) calculated from randomly generated, non-overlapping subsets of patients with SLE (seven subsets each containing 73 or 74 patients) and disease controls (two subsets containing 71 and 72 patients) from the validation cohort. The majority of control (average 80%) and SLE (average 82%) patients belong to the lowest (0%–14%) and the highest (87%–100%) risk probability groups, respectively. In accordance, accuracy was highest in these two extreme risk groups but dropped in the intermediate ones (15%–43%, 44%–86%). (B) Bar plot representation of the relative proportion of SLE and disease controls (validation cohort) within each SLE risk probability bin (0%–14%, 15%–43%, 44%–86%, 87%–100%). Calculations were made from the non-overlapping subsets of patients with SLE and disease controls as outlined in (A). (C) Positive- and negative-likelihood ratios (LRs) (mean, 95% CI) for the diagnosis of SLE against control diagnoses, according to different SLE risk probability thresholds (>14%,>43%,>86%) applied to the discovery cohort. Calculations were made from the non-overlapping subsets of patients with SLE and disease controls as outlined in (A). The >14% threshold had an average LR+5.0 and LR–0.017, which correspond to a moderate increase when tested positive and a large decrease when tested negative in the likelihood for SLE, respectively. (D) Matrices of SLE risk probabilities based on different combinations of features included in the LASSO-LR diagnostic model. In each scenario, the calculated probability fits to one of the four SLE risk groups corresponding to varying diagnostic certainty levels (unlikely SLE: 0%–14%, possible/cannot rule out SLE: 15%–43%, likely SLE: 44%–86%, definite SLE: 87%–100%). (E) Dot plot analysis of the model-generated SLE risk probabilities according to the severity of disease manifestations (defined based on the BILAG system) and organ damage (SLICC/ACR Damage Index (SDI)). Data were generated from the validation cohort patients with SLE (n=512) and are presented as mean (95% CI). The Kruskal-Wallis (non-parametric) analysis of variance test was performed and two-tailed p values are shown. ANA, antinuclear antibodies; RMDs, rheumatic diseases; SCLE, subacute cutaneous lupus erythematosus; SDI, SLICC/ACRdamage index; SLE, systemic lupus erythematosus.
Figure 4
Figure 4
The new diagnostic model has high accuracy for systemic lupus erythematosus (SLE) including early and severe disease requiring immunosuppressive or biologic treatment. (A) Confusion matrix of the actual versus predicted cases of patients with SLE (n=512) and disease controls (n=143) in the validation cohort. The LASSO-LR diagnostic model was operated as binary (SLE or not-SLE) by setting the SLE risk probability threshold at ≥50%. Based on the number of true-positive, true-negative, false-positive and false-negative cases, sensitivity, specificity, accuracy, positive- and negative-likelihood ratios are estimated as metrics of the model diagnostic performance. (B) Sensitivity of the LASSO-LR model (operated as binary) for the detection of clinically relevant subsets of SLE including early disease, lupus nephritis, neuropsychiatric lupus, haematological lupus and severe lupus requiring potent immunosuppressive and/or biologic treatment.

Comment in

References

    1. Doria A, Zen M, Canova M, et al. . Sle diagnosis and treatment: when early is early. Autoimmun Rev 2010;10:55–60. 10.1016/j.autrev.2010.08.014 - DOI - PubMed
    1. Feng X, Zou Y, Pan W, et al. . Associations of clinical features and prognosis with age at disease onset in patients with systemic lupus erythematosus. Lupus 2014;23:327–34. 10.1177/0961203313513508 - DOI - PubMed
    1. Morgan C, Bland AR, Maker C, et al. . Individuals living with lupus: findings from the lupus UK members survey 2014. Lupus 2018;27:681–7. 10.1177/0961203317749746 - DOI - PMC - PubMed
    1. Oglesby A, Korves C, Laliberté F, et al. . Impact of early versus late systemic lupus erythematosus diagnosis on clinical and economic outcomes. Appl Health Econ Health Policy 2014;12:179–90. 10.1007/s40258-014-0085-x - DOI - PubMed
    1. Faurschou M, Dreyer L, Kamper A-L, et al. . Long-Term mortality and renal outcome in a cohort of 100 patients with lupus nephritis. Arthritis Care Res 2010;62:873–80. 10.1002/acr.20116 - DOI - PubMed

Publication types

Substances

LinkOut - more resources