Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jul 30;14(1):17545.
doi: 10.1038/s41598-024-67767-3.

Mitigating data quality challenges in ambulatory wrist-worn wearable monitoring through analytical and practical approaches

Affiliations

Mitigating data quality challenges in ambulatory wrist-worn wearable monitoring through analytical and practical approaches

Jonas Van Der Donckt et al. Sci Rep. .

Abstract

Chronic disease management and follow-up are vital for realizing sustained patient well-being and optimal health outcomes. Recent advancements in wearable technologies, particularly wrist-worn devices, offer promising solutions for longitudinal patient monitoring, replacing subjective, intermittent self-reporting with objective, continuous monitoring. However, collecting and analyzing data from wearables presents several challenges, such as data entry errors, non-wear periods, missing data, and wearable artifacts. In this work, we explore these data analysis challenges using two real-world datasets (mBrain21 and ETRI lifelog2020). We introduce practical countermeasures, including participant compliance visualizations, interaction-triggered questionnaires to assess personal bias, and an optimized pipeline for detecting non-wear periods. Additionally, we propose a visualization-oriented approach to validate processing pipelines using scalable tools such as tsflex and Plotly-Resampler. Lastly, we present a bootstrapping methodology to evaluate the variability of wearable-derived features in the presence of partially missing data segments. Prioritizing transparency and reproducibility, we provide open access to our detailed code examples, facilitating adaptation in future wearable research. In conclusion, our contributions provide actionable approaches for improving wearable data collection and analysis.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
mBrain study interaction visualization of a single participant for a period of 90 days. The figure consists of several subplots with a shared x-axis, each providing different layers of information about the participant's activity and interactions. Subplots (i) and (ii) display phone and wearable data sessions over time, with each bar on the x-axis representing a unique day. For the first four plots, the y-axis indicates the time of day, revealing patterns of data fragmentation and daily volumes over extended periods. Gray-shaded areas indicate weekends. The mBrain study requires a minimum of eight hours of wearable data daily. This compliance is color-coded in the first two subplots: green represents days with more than 8 h, while orange indicates less than 8 h. The daily events subplot (iii) provides an overview of food intakes and questionnaire interactions. Subplot (iv) provides a visual record of the participant's headaches and medication intake. The final subplot (v), shows the interaction rate (%) on the y-axis, illustrating the frequency of participant interactions with stress and activity timeline events derived from the wearable data stream.
Figure 2
Figure 2
Example of an mBrain alert message, shown to the study coordinators when no wearable data is received from a participant.
Figure 3
Figure 3
(a) Screenshot of questions in the mBrain study’s morning questionnaire evaluating implicitness for headache and medication events. (b) Notifications are activated based on responses to the implicitness questions.
Figure 4
Figure 4
Example mBrain application notifications when conflicting data entries were made by a participant.
Figure 5
Figure 5
mBrain stress event interaction and its corresponding misprediction questionnaire. Note: When a stress-system-activation (e.g., a sudden non-activity-induced increase in skin conductance responses) is detected in the streamed wearable data, a notification is sent to the user as shown in (a). This notification aids in reducing the interaction latency of the participant. When clicking on this notification, the participant is guided toward the mBrain timeline in which the recent stress event is shown, as depicted by (b). The yellow circle indicates that the participant re-labeled the stress-event period to be non-stressful. This, in turn, prompts the participant whether they have time to fill in a questionnaire that gauges for more contextual information about this event. This questionnaire is portrayed in (c) and indicates that the user was performing a demanding mental activity that was not perceived as really pleasant, possibly explaining the stress response.
Figure 6
Figure 6
mBrain study wearable wear behavior overview of a single participant. The upper subplot illustrates the available wearable sessions, using similar bar intervals as Fig. 2, providing an overview of wearable usage. In this subplot, weekends are marked in gray, and headache intervals in red. This participant has an average wearable data ratio of 44%, whereas the available data ratio during headaches is 39%. The lower left subplot depicts the average data ratio for the time of day throughout the study period. This subgraph reveals a notable decline in wearable use between 17 h 30 and 22 h 30. Conversely, the lower right subplot utilizes a heatmap to display the average data ratio against the time of day, distributed over each day of the week, highlighting discernible patterns in wear frequency. This heatmap elucidates that this specific participant has a tendency for reduced wearable use on Fridays and Saturdays, while Wednesdays exhibit the most wearable use. Remark how the reduced wearable usage during the evening period, shown by the through in the lower left subplot, is also discernable in this heatmap visualization.
Figure 7
Figure 7
Visual comparison of Böttcher’s and our refined non-wear detection algorithm on the same excerpt. (a) Our refined non-wear algorithm. (b) Non-wear algorithm of Böttcher et al. The red-shaded area in each subplot of both (a) and (b) represents a labeled non-wear interval. Subplot (i) and (ii) in figure (a) and (b) depict the signal-specific SQIs for the skin conductance and temperature, while subplot (iii) represents the standard deviation of the ACC and the corresponding ACC-SD SQI. Subplot (iv) shows the three-axis accelerometer data alongside the resulting “Wrist_SQI”. A low Wrist_SQI value between 08 h 55 and 09 h 00 in subfigure (a) denotes non-wear. Examining this time interval in subplots (i) and (ii) of (a), a notable decline in skin conductance and temperature is observed, leading to low SQI values. Minimal movement within this interval also reflects a low SQI value in subplot (iii). Conversely, in subfigure (b), this non-wear bout remains undetected, primarily due to the valid temperature SQI range (i.e. between 25 and 40 °C). This lower bound may be set too low, as only the last part of the skin temperature segment during this non-wear period results in a low SQI.
Figure 8
Figure 8
Flowchart for handling artifacts in raw ambulatory (daytime) wearable data.
Figure 9
Figure 9
Skin conductance signal processing to discern valid and invalid regions and the resulting processed signal. The figure consists of two vertically stacked subplots that share the same x-axis. The upper subplot displays the raw EDA signal depicted by the gray line, with valid and invalid SQI regions distinguished by green and red backgrounds, respectively. The processed EDA data is illustrated by an orange line. Remark that there is no one-on-one relationship between the processed EDA data and the valid regions. This is because the duration and frequency of these invalid regions affect the eventual retention of the raw EDA signal. Specifically, brief and infrequent invalid segments, like those until 12h05, can be effectively imputed using interpolation, resulting in no data exclusion in the processed EDA signal. Conversely, as the frequency and/or duration of invalid segments increases, evidenced between 12 h 05 and 12 h 06, successful interpolation is compromised, resulting in disregarding these invalid regions. Moreover, processed EDA segments, but shorter than 60 s (e.g., valid segments between 12 h 06 and 12 h 08), are excluded given their limited analytical utility. The lower subplot elucidates the components of the skin conductance SQI. In alignment with the non-wear detection pipeline, multiple sub-SQIs are utilized. The noise amplitude of the EDA, averaged over a two-second window, is delineated by a purple line. This signal is thresholded to determine the noise sub-SQI, marked by the green line.
Figure 10
Figure 10
Flowchart illustrating the methodology for performing data analysis using incomplete data.
Figure 11
Figure 11
Complementary cumulative distribution plot of the window-of-interest data ratios for two participants. The y-axis represents the number of available window-of-interest samples, while the x-axis indicates the corresponding data ratio. Each curve in the plot represents the complementary cumulative distribution of a participant, providing a visual assessment of overall data availability per participant. Furthermore, when utilizing a data-ratio threshold, exemplified via the dashed vertical gray line for the data-ratio of 0.85, this visualization allows determining the remaining number of samples adhering to this threshold.
Figure 12
Figure 12
Overview of a single block-based bootstrapping iteration using the median as desired metric. The figure comprises three vertical stacked subplots on the left that share an x-axis, with the window-of-interest highlighted by a gray-shaded area. The two vertical subplots on the right side share an x-axis as well. Subplot (i) depicts an excerpt of processed wearable movement data, for which non-wear periods have been removed. Remark that no non-wear periods were detected, and no data is missing, resulting in a complete valid segment for our window of interest. Subplot (ii) visualizes the transformed ACC data of (i) into a second-per-second activity intensity index, AIABS, in accordance with Bai et al.. This AIABS signal is then utilized to compute our desired metric values, specifically, the median value of all data within our window of interest. This reference metric value is represented by a bold dashed black line in subplot (iv) and (v). Subsequently, gap-based bootstrapping is employed utilizing the complete movement intensity data from subplot (ii) as input. In particular, one or multiple block-based gaps are generated to create a gap-induced signal, shown in subplot (iii), maintaining a specific retention data ratio, which is in this illustration 0.6. The modified signal is then used to compute the desired metric, which is depicted by the vertical green dotted line in subplot (iv). Each bootstrap iteration results in adding another data point in subplot (v), which can then be utilized to assess the spread for a given data retention ratio. Further specifics can be found on Github (https://github.com/predict-idlab/data-quality-challenges-wearables/blob/main/notebooks/mBrain/C7_missing_data.ipynb).
Figure 13
Figure 13
Spread analysis of block-based gap bootstrapping for various data ratios and metrics. Each row in the figure represents a distinct reference series, signifying a window of interest from a unique moment. Different columns correspond to varying metrics, with the vertical dashed black line illustrating the metric value of the gap-free reference series. In creating this specific visualization, the accelerometer data from the Empatica E4 was transformed into a second-by-second activity index, AIABS, as per the methodology detailed by Bai et al. and illustrated in Fig. 12. The considered metrics are the 50th percentile, 75th percentile, and mean values calculated from the AIABS data of the selected time window.

References

    1. Heikenfeld, J. et al. Wearable sensors: Modalities, challenges, and prospects. Lab. Chip18, 217–248 (2018). 10.1039/C7LC00914C - DOI - PMC - PubMed
    1. Baig, M. M., GholamHosseini, H., Moqeem, A. A., Mirza, F. & Lindén, M. A systematic review of wearable patient monitoring systems: Current challenges and opportunities for clinical adoption. J. Med. Syst.41, 115 (2017). 10.1007/s10916-017-0760-1 - DOI - PubMed
    1. Taylor, M. L., Thomas, E. E., Snoswell, C. L., Smith, A. C. & Caffery, L. J. Does remote patient monitoring reduce acute care use? A systematic review. BMJ Open11, e040232 (2021). 10.1136/bmjopen-2020-040232 - DOI - PMC - PubMed
    1. Klonoff, D. C. Continuous glucose monitoring: Roadmap for 21st century diabetes therapy. Diabetes Care28, 1231–1239 (2005). 10.2337/diacare.28.5.1231 - DOI - PubMed
    1. Bayoumy, K. et al. Smart wearable devices in cardiovascular care: Where we are and how to move forward. Nat. Rev. Cardiol.10.1038/s41569-021-00522-7 (2021). 10.1038/s41569-021-00522-7 - DOI - PMC - PubMed

LinkOut - more resources