Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Sep 27;4(4):497-507.
doi: 10.3390/clockssleep4040039.

The Impact of Missing Data and Imputation Methods on the Analysis of 24-Hour Activity Patterns

Affiliations

The Impact of Missing Data and Imputation Methods on the Analysis of 24-Hour Activity Patterns

Lara Weed et al. Clocks Sleep. .

Abstract

The purpose of this study is to characterize the impact of the timing and duration of missing actigraphy data on interdaily stability (IS) and intradaily variability (IV) calculation. The performance of three missing data imputation methods (linear interpolation, mean time of day (ToD), and median ToD imputation) for estimating IV and IS was also tested. Week-long actigraphy records with no non-wear or missing timeseries data were masked with zeros or 'Not a Number' (NaN) across a range of timings and durations for single and multiple missing data bouts. IV and IS were calculated for true, masked, and imputed (i.e., linear interpolation, mean ToD and, median ToD imputation) timeseries data and used to generate Bland-Alman plots for each condition. Heatmaps were used to analyze the impact of timings and durations of and between bouts. Simulated missing data produced deviations in IV and IS for longer durations, midday crossings, and during similar timing on consecutive days. Median ToD imputation produced the least deviation among the imputation methods. Median ToD imputation is recommended to recapitulate IV and IS under missing data conditions for less than 24 h.

Keywords: actigraphy; circadian rhythms; imputation; interdaily stability; intradaily variability.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Rhythm regularity (IS) for a single missing data gap starting on a representative day (Tuesday). Data are the mean difference between the masked and true IS values (D,E) or imputed and true IS values (AC), as extracted from Bland–Altman plots. Three different imputation methods [linear interpolation (A), mean Time of day (ToD) (B), median ToD (C)] and two masking methods [NaNs (D), zeros (E)] are presented for varied durations (y-axis) and timing (x-axis) of masked data gaps. Values are color-coded as indicated with best performance being closer to 0 (green). For heat maps of each individual day of rhythm regularity, see Supplemental Data Figure S2.
Figure 2
Figure 2
Rhythm fragmentation (IV) for a single missing data gap starting on a representative day (Tuesday). Data are the mean difference between the masked and true IV values (D,E) or imputed and true IV values (AC) as extracted from Bland–Altman plots. Three different imputation methods [linear interpolation (A), mean ToD (B), median ToD (C)] and two masking methods [NaNs (D), zeros (E)] are presented for varied durations (y-axis) and timing (x-axis) of masked data gaps. Values are color-coded as indicated with best performance being closer to 0. For heat maps of each individual day of rhythm fragmentation, see Supplemental Data Figure S5.
Figure 3
Figure 3
Rhythm regularity (IS) for two gaps (gap 1: 115 min, gap 2: 140 min) of missing data starting on a representative day (Tuesday). Data are the mean difference between the masked and true IS values (D,E) or imputed and true IS values (AC), as extracted from Bland–Altman plots. Three different imputation methods [linear interpolation (A), mean ToD (B), median ToD (C)] and two masking methods [NaNs (D), zeros (E)] are presented for varied durations between bouts (y-axis) and timings (x-axis) of masked data gaps. Values are color-coded as indicated with best performance being closer to 0; NaN values indicate where values could not be calculated due to dataset constraints. For heat maps of each individual day of rhythm regularity, see Supplemental Data Figure S8.
Figure 4
Figure 4
Rhythm fragmentation (IV) for two gaps (gap 1:115 min, gap 2:140 min) of missing data starting on a representative day (Tuesday). Data are the mean difference between the masked and true IV values (D,E) or imputed and true IV values (AC), as extracted from Bland–Altman plots. Three different imputation methods [linear interpolation (A), mean ToD (B), median ToD (C)] and two masking methods [NaNs (D), zeros (E)] are presented for varied durations between bouts (y-axis) and timing (x-axis) of masked data gaps. Values are color-coded as indicated with best performance being closer to 0; NaN values indicate where values could not be calculated due to dataset constraints. For heat maps of each individual day of rhythm fragmentation, see Supplemental Data Figure S11.
Figure 5
Figure 5
Consort Diagram. In total, 103,685 files were assessed for eligibility, of which 19,747 were excluded, resulting in 83,938 accelerometer files (A). A random subset (N = 84 files, 0.01% of extracted sample) of individuals with at least 7 days of data without missing data were subjected to masking, imputation, IV and IS calculation (B).
Figure 6
Figure 6
Mask overview. Data were systematically removed in single gaps at various durations (A), as well as single gaps starting at various times (B), while multiple gaps of missing data were varied in duration between gaps (C), as well as gap start time (D).
Figure 7
Figure 7
Example of a segment with complete data (A), and linear interpolation data (B), mean ToD imputed data (C), and median ToD imputed data (D) on 5 h of missing data starting at 10 am. Linear interpolation (B) is highly dependent on the values surrounding the gap, mean ToD imputation (C) has more smoothing than median ToD imputation (D); each of the imputation methods are statistical and do not perfectly represent the true data (A).
Figure 8
Figure 8
Sample Bland–Altman plots for IS masked with a single 5 h gap starting at 10 am and imputed. The solid black line depicts the mean, while dotted lines indicate ±1.96 × standard deviation and the gray line represents the linear fitted slope. Performance of linear interpolation (A), mean imputation (B), median imputation (C), data masked with NaNs (D), and data masked with zeros (E) are presented.

Similar articles

Cited by

References

    1. Lok R., Zeitzer J.M. A temporal threshold for distinguishing off-wrist from inactivity periods: A retrospective actigraphy analysis. Clocks Sleep. 2020;2:466–472. doi: 10.3390/clockssleep2040034. - DOI - PMC - PubMed
    1. Ustinov Y., Lichstein K.L. Actigraphy reliability with normal sleepers. Behav. Sleep Med. 2012;11:313–320. doi: 10.1080/15402002.2012.688779. - DOI - PubMed
    1. Ahmadi M.N., Nathan N., Sutherland R., Wolfenden L., Trost S.G. Non-wear or sleep? Evaluation of five non-wear detection algorithms for raw accelerometer data. J. Sports Sci. 2019;38:399–404. doi: 10.1080/02640414.2019.1703301. - DOI - PubMed
    1. Choi L., Liu Z., Matthews C.E., Buchowski M.S. Validation of accelerometer wear and nonwear time classification algorithm. Med. Sci. Sports Exerc. 2011;43:357. doi: 10.1249/MSS.0b013e3181ed61a3. - DOI - PMC - PubMed
    1. Sadeh A., Hauri P.J., Kripke D.F., Lavie P. The role of actigraphy in the evaluation of sleep disorders. Sleep. 1995;18:288–302. doi: 10.1093/sleep/18.4.288. - DOI - PubMed

LinkOut - more resources