Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Observational Study
. 2025 May 12:27:e67156.
doi: 10.2196/67156.

Population-Wide Depression Incidence Forecasting Comparing Autoregressive Integrated Moving Average and Vector Autoregressive Integrated Moving Average to Temporal Fusion Transformers: Longitudinal Observational Study

Affiliations
Observational Study

Population-Wide Depression Incidence Forecasting Comparing Autoregressive Integrated Moving Average and Vector Autoregressive Integrated Moving Average to Temporal Fusion Transformers: Longitudinal Observational Study

Deliang Yang et al. J Med Internet Res. .

Abstract

Background: Accurate prediction of population-wide depression incidence is vital for effective public mental health management. However, this incidence is often influenced by socioeconomic factors, such as abrupt events or changes, including pandemics, economic crises, and social unrest, creating complex structural break scenarios in the time-series data. These structural breaks can affect the performance of forecasting methods in various ways. Therefore, understanding and comparing different models across these scenarios is essential.

Objective: This study aimed to develop depression incidence forecasting models and compare the performance of autoregressive integrated moving average (ARIMA) and vector-ARIMA (VARIMA) and temporal fusion transformers (TFT) under different structural break scenarios.

Methods: We developed population-wide depression incidence forecasting models and compared the performance of ARIMA and VARIMA-based methods to TFT-based methods. Using monthly depression incidence from 2002 to 2022 in Hong Kong, we applied sliding windows to segment the whole time series into 72 ten-year subsamples. The forecasting models were trained, validated, and tested on each subsample. Within each 10-year subset, the first 7 years were used for training, with the eighth year for setting hold-out validation, and the ninth and tenth years for testing. The accuracy of the testing set within each 10-year subsample was measured by symmetric mean absolute percentage error (SMAPE).

Results: We found that in subsamples without significant slope or trend change (structural break), multivariate TFT significantly outperformed univariate TFT, vector-ARIMA (VARIMA), and ARIMA, with an average SMAPE of 11.6% compared to 13.2% (P=.01) for univariate TFT, 16.4% (P=.002) for VARIMA, and 14.8% (P=.003) for ARIMA. Adjusting for the unemployment rate improved TFT performance more effectively than VARIMA. When fluctuating outbreaks happened, TFT was more robust to sharp interruptions, whereas VARIMA and ARIMA performed better when incidence surged and remained high.

Conclusions: This study provides a comparative evaluation of TFT and ARIMA and VARIMA models for forecasting depression incidence under various structural break scenarios, offering insights into predicting disease burden during both stable and unstable periods. The findings support a decision-making framework for model selection based on the nature of disruptions and data characteristics. For public health policymaking, the results suggest that TFT may be a more suitable tool for disease burden forecasting during periods of stable burden level or when sudden temporary interruption, such as pandemics or socioeconomic variation, impacts disease occurrence.

Keywords: ARIMA; deep learning; depression incidence forecasting; electronic health records; machine learning; medical informatics; population-wide depression incidence; structural break scenarios; temporal fusion transformers; vector-ARIMA.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: XL received research grants from the Research Fund Secretariat of the Health Bureau, Health and Medical Research Fund (HMRF, HKSAR), Health and Medical Research Fund Fellowship Scheme (HMRF Fellowship, HKSAR), Research Grants Council Early Career Scheme (RGC/ECS, HKSAR), Commission grants from Hospital Authority of Hong Kong; educational and investigator initiate research fund from Janssen, Pfizer, and Amgen; internal funding from the University of Hong Kong; consultancy fee from Pfizer, Merck Sharp & Dohme, Open Health, Office of Health Economics; she is also the former non-executive director of ADAMS Limited Hong Kong; all outside the submitted work. ICKW reports research funding from the Hong Kong Research Grants Council, the Hong Kong Health and Medical Research Fund, the European Commission, IQVIA, and Amgen outside the submitted work; and is a director of Jacobson Medical, Advanced Data Analytics for Medical Science (ADAMS) in Hong Kong and a former director of Therakind Ltd in London and Asia Medicine Regulatory Affairs (AMERA) Services Limited, he was a consultant to IQVIA and World Health Organization; and serve as a member of the Pharmacy and Poisons Board, Hong Kong SAR.

Figures

Figure 1
Figure 1
Diagram of the streamlined analytical plan for the comparative study of forecasting models. ARIMA: Auto-Regressive Integrated Moving Average; EMR: electronic medical records; Multivariate TFT: Multivariate Temporal Fusion Transformers; SMAPE: symmetric mean absolute percentage error; Univariate TFT: Univariate Temporal Fusion Transformers; VARIMA: Vector Auto-Regressive Integrated Moving Average.
Figure 2
Figure 2
Depression incidence and unemployment rate in Hong Kong between 2002 and 2023. (A) Depression incidence time series for the overall population (age-standardized) and for specific age subgroups: vertical pink dotted lines represent the breakpoints on the timeseries as indicated by Chow’s test. (B) Unemployment rate time series for the overall 20 years population and for specific age subgroups. (C) Ten-year sub-timeseries sample set construction: segmenting the 10-year sub-timeseries (one window) according to year-by-year sliding. In the example shown for 2002-2022, there are 12 sliding windows in total. The first 7 years in each sub-timeseries is the training set, the eighth year is the validation set, and the ninth and tenth year are the testing set. In the database used in this study, we analyzed 72 sub-timeseries datasets (12 samples×6 groups) from the overall population and age subgroups. (D) An example of stable period sample. (E) An example of unstable period sample with sharp interruptions. (F) An example of unstable period sample with level shift.
Figure 3
Figure 3
Testing accuracy comparison between models. (A) Stable periods with no breakpoint between training, validation, and testing periods. (B) Unstable periods with one or more breakpoints between training, validation, and testing periods. ARIMA: autoregressive integrated moving average; multiTFT: multivariate temporal fusion transformers; SMAPE: symmetric mean absolute percentage error; uniTFT: univariate temporal fusion transformers; VARIMA: vector autoregressive integrated moving average.
Figure 4
Figure 4
Model performance comparison during unstable periods with a sharp interruption or level shift. Training set: 2013-2019; validation set: 2020; testing set: 2021-2022. (A) Model performance comparison during unstable periods with a sharp interruption in 2019: the red circles highlight the last points in the training set, which heavily influenced the prediction output of the ARIMA/VARIMA models due to their autoregression mechanism. These points fall within the sharp interruption period of 2019. (B) Model performance comparison during unstable period with level shift at or after 2020: the red lines indicate the changes in levels occurring at or after 2020. ARIMA: autoregressive integrated moving average; multivariate TFT: multivariate temporal fusion transformers; univariate TFT: univariate temporal fusion transformers; VARIMA: vector autoregressive integrated moving average.

References

    1. Institute of Health Metrics and Evaluation. GBD results. [2024-08-28]. https://vizhub.healthdata.org/gbd-results/
    1. COVID-19 Mental Disorders Collaborators Global prevalence and burden of depressive and anxiety disorders in 204 countries and territories in 2020 due to the COVID-19 pandemic. Lancet. 2021;398(10312):1700–1712. doi: 10.1016/S0140-6736(21)02143-7. https://linkinghub.elsevier.com/retrieve/pii/S0140-6736(21)02143-7 S0140-6736(21)02143-7 - DOI - PMC - PubMed
    1. Chan VKY, Chai Y, Chan SSM, Luo H, Jit M, Knapp M, Bishai DM, Ni MY, Wong ICK, Li X. Impact of COVID-19 pandemic on depression incidence and healthcare service use among patients with depression: an interrupted time-series analysis from a 9-year population-based study. BMC Med. 2024;22(1):169. doi: 10.1186/s12916-024-03386-z. https://bmcmedicine.biomedcentral.com/articles/10.1186/s12916-024-03386-z 10.1186/s12916-024-03386-z - DOI - DOI - PMC - PubMed
    1. Liu Q, He H, Yang J, Feng X, Zhao F, Lyu J. Changes in the global burden of depression from 1990 to 2017: findings from the Global Burden of Disease study. J Psychiatr Res. 2020;126:134–140. doi: 10.1016/j.jpsychires.2019.08.002. https://linkinghub.elsevier.com/retrieve/pii/S0022-3956(19)30738-1 S0022-3956(19)30738-1 - DOI - PubMed
    1. König H, König H-h, Konnopka A. The excess costs of depression: a systematic review and meta-analysis. Epidemiol Psychiatr Sci. 2019;29:e30. doi: 10.1017/S2045796019000180. https://europepmc.org/abstract/MED/30947759 S2045796019000180 - DOI - PMC - PubMed

Publication types

LinkOut - more resources