Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Feb 12;7(1):33.
doi: 10.1038/s41746-024-01013-y.

Digital health technologies and machine learning augment patient reported outcomes to remotely characterise rheumatoid arthritis

Affiliations

Digital health technologies and machine learning augment patient reported outcomes to remotely characterise rheumatoid arthritis

Andrew P Creagh et al. NPJ Digit Med. .

Abstract

Digital measures of health status captured during daily life could greatly augment current in-clinic assessments for rheumatoid arthritis (RA), to enable better assessment of disease progression and impact. This work presents results from weaRAble-PRO, a 14-day observational study, which aimed to investigate how digital health technologies (DHT), such as smartphones and wearables, could augment patient reported outcomes (PRO) to determine RA status and severity in a study of 30 moderate-to-severe RA patients, compared to 30 matched healthy controls (HC). Sensor-based measures of health status, mobility, dexterity, fatigue, and other RA specific symptoms were extracted from daily iPhone guided tests (GT), as well as actigraphy and heart rate sensor data, which was passively recorded from patients' Apple smartwatch continuously over the study duration. We subsequently developed a machine learning (ML) framework to distinguish RA status and to estimate RA severity. It was found that daily wearable sensor-outcomes robustly distinguished RA from HC participants (F1, 0.807). Furthermore, by day 7 of the study (half-way), a sufficient volume of data had been collected to reliably capture the characteristics of RA participants. In addition, we observed that the detection of RA severity levels could be improved by augmenting standard patient reported outcomes with sensor-based features (F1, 0.833) in comparison to using PRO assessments alone (F1, 0.759), and that the combination of modalities could reliability measure continuous RA severity, as determined by the clinician-assessed RAPID-3 score at baseline (r2, 0.692; RMSE, 1.33). The ability to measure the impact of the disease during daily life-through objective and remote digital outcomes-paves the way forward to enable the development of more patient-centric and personalised measurements for use in RA clinical trials.

PubMed Disclaimer

Conflict of interest statement

A.P.C, H.Y, G.M, A.D, D.A.C are employees of the University of Oxford. A.P.C is a GSK postdoctoral fellow and acknowledges the support of GSK. D.A.C received research funding from GSK to conduct this work. In addition, A.D., H.Y., and G.M. acknowledge the support of Novo Nordisk plc. A.D. AD is supported by the Wellcome Trust [223100/Z/21/Z]. V.H, W-H.C, R.T, R.W and L.G-G are employees of GSK and own stock and or shares. C.L, C.Y, M.S.D are employees of Analysis Group, which received research funding from GSK to conduct the study.

Figures

Fig. 1
Fig. 1. Illustration detailing the objectives of this study.
The weaRAble-PRO 14-day trial aimed to investigate how digital health technologies (DHT)—a wrist-worn Apple smartwatch and an iPhone device, with bespoke mobile apps.—could augment patient reported outcomes (PRO) to characterise the impact of rheumatoid arthritis (RA) during the daily life of 30 moderate-to-severe RA patients, compared to 30 matched healthy controls (HC). We explore the ability of machine learning (ML) models to (1) estimate categorical RA outcomes, such as identifying RA participants from healthy controls and (2) estimate continuous RA outcomes, such as RA severity, using a combination of PRO and sensor-outcomes.
Fig. 2
Fig. 2. Ability of individual sensor-outcomes to distinguish between RA status and RA severity levels.
Comparison of the average feature distributions per participants, between healthy controls (HC), RA (moderate) and RA (severe) groups for: ac selection of passively collected smartwatch features; df selection of guided test collected smartphone features; and gi selection of patient self-reported outcomes recorded on the smartphone application. For all examples shown, medians were significantly different between HC and RA groups: One-way ANOVA determined from the Kruskal-Wallis H-test, p < 0.001. deg degrees, HAQ-DI Health Assessment Questionnaire-Disability Index, min minutes, mg mili-gravity acceleration units, MVPA moderate-to-vigorous physical activity, RASIQ GSK RA symptom and impact questionnaire, sed sedentary, sec seconds.
Fig. 3
Fig. 3. Ability of combined sensor-outcomes to distinguish between RA status and RA severity levels.
Comparison of a RA identification (RA vs. HC) performance and b RA severity level estimation (RA (mod) vs RA (sev)), using patient reported outcomes (PRO) and combined PRO (list icon), active (smartphone icon), and passive (smartwatch icon) sensor-based outcomes in the weaRAble-PRO study. auroc area under the receiver operator curve, κ Cohen’s Kappa statistic, F1 macro-F1 score.
Fig. 4
Fig. 4. The number of days of sensor-data required to remotely characterise RA impact.
Comparison of a the minimal amount of days of data needed distinguish RA status, as measured by the F1 score across 5-fold cross validation (CV), between active (smartphone icon), passive (smartwatch icon), and combined (smartphone & smartwatch icons) feature sources; b the feature (test-retest) reliability, as measured by the intraclass correlation coefficient (ICC), between RA participants and HC across the study duration (14 days); F1 scores and ICCs suggest that model performance and feature reliability stabilises once more than 7 days of data are used per participant.
Fig. 5
Fig. 5. The number of sensor-outcomes required to remotely distinguish RA status.
Comparison of features selected between regularised logistic regression (LR) models for: a elastic-net (F1, 0.79) and b SG-lasso (F1, 0.81). The SG-lasso promotes group-wise sparsity (i.e., regularising the number of feature domains) and within-group sparsity (i.e., regularising the number of features per domain), achieving a similar performance to LR elastic-net, while selecting a fewer number of domains and features. Feature importance, denoted as the mean LR coefficient value (w) over cross-validation, are illustrated by colour intensity. Feature domains: AF activity fragmentation, DEM demographics, LTS lie-to-stand assessment, MORN morning stiffness, NTR night-time restlessness, PEG 9-hole peg test, STS sit-to-stand assessment, TVDA total volume of daytime activity, WLK walking assessment, WRT wrist assessment.
Fig. 6
Fig. 6. The ability of remote PRO + sensor-outcomes to estimate in-clinic determined RA severity scores.
Scatter plot of baseline RAPID-3 scores y versus predicted y^ scores per subject, using elastic net with PRO + sensor-outcomes, over cross-validation (CV). Participant model-estimated RAPID-3 scores can be further interpreted through detailed inspection of the daily smartphone-based patient-reported joint pain map (JMAP) total scores—which was not included as a predictor in the model. Higher JMAP scores indicate higher levels of pain experienced. Additional interpretability, through the JMAP, demonstrated that PRO + sensor-based outcome estimation of the RAPID-3 could reliably reflect patient’s perceived daily RA symptoms. Note: Baseline JMAP total scores, recorded on the same day as the baseline RAPID-3, are denoted in grey; the JMAP y-axis scale is the same among all subplots. HC subjects were assigned a RAPID-3 score of zero at baseline. A black line represents perfect predictions (r2, 0.692; MAE, 0.938; RMSE, 1.333).
Fig. 7
Fig. 7. Self-supervised learning pipeline.
Continuous (passive) actigraphy was recorded from patients' Apple smartwatch over the study duration. Deep convolutional neural networks (DCNN) were pre-trained on 700,000 person days in the publicly available UK Biobank using self-supervised learning—and fine-tuned with the Capture-24 dataset—to estimate participant’s daily activity patterns in the weaRAble-PRO study. Physical activity (PA) metrics of daily-life, for example, the time spent walking, the frequency of exercise, or the length and quality of sleep were investigated as markers to characterise symptoms of disease in people with RA compared to HC.

References

    1. Grassi W, De Angelis R, Lamanna G, Cervini C. The clinical features of rheumatoid arthritis. Eur. J. Radiol. 1998;27:S18–S24. doi: 10.1016/S0720-048X(98)00038-2. - DOI - PubMed
    1. Banderas B, Skup M, Shields AL, Mazar I, Ganguli A. Development of the rheumatoid arthritis symptom questionnaire (rasq): a patient reported outcome scale for measuring symptoms of rheumatoid arthritis. Curr. Med. Res. Opin. 2017;33:1643–1651. doi: 10.1080/03007995.2017.1338562. - DOI - PubMed
    1. Lubeck DP. Patient-reported outcomes and their role in the assessment of rheumatoid arthritis. Pharmacoeconomics. 2004;22:27–38. doi: 10.2165/00019053-200422001-00004. - DOI - PubMed
    1. Campbell, R., Ju, A., King, M. T. & Rutherford, C. Perceived benefits and limitations of using patient-reported outcome measures in clinical practice with individual patients: a systematic review of qualitative studies. Quality Life Res. 1–24 (2021). - PubMed
    1. Gossec L, Dougados M, Dixon W. Patient-reported outcomes as end points in clinical trials in rheumatoid arthritis. RMD Open. 2015;1:e000019. doi: 10.1136/rmdopen-2014-000019. - DOI - PMC - PubMed