Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jan 22;24(1):113.
doi: 10.1186/s12879-023-08969-4.

Trend analysis and prediction of gonorrhea in mainland China based on a hybrid time series model

Affiliations

Trend analysis and prediction of gonorrhea in mainland China based on a hybrid time series model

Zhende Wang et al. BMC Infect Dis. .

Abstract

Background: Gonorrhea has long been a serious public health problem in mainland China that requires attention, modeling to describe and predict its prevalence patterns can help the government to develop more scientific interventions.

Methods: Time series (TS) data of the gonorrhea incidence in China from January 2004 to August 2022 were collected, with the incidence data from September 2021 to August 2022 as the validation. The seasonal autoregressive integrated moving average (SARIMA) model, long short-term memory network (LSTM) model, and hybrid SARIMA-LSTM model were used to simulate the data respectively, the model performance were evaluated by calculating the mean absolute percentage error (MAPE), root mean square error (RMSE), and mean absolute error (MAE) of the training and validation sets of the models.

Results: The Seasonal components after data decomposition showed an approximate bimodal distribution with a period of 12 months. The three models identified were SARIMA(1,1,1) (2,1,2)12, LSTM with 150 hidden units, and SARIMA-LSTM with 150 hidden units, the SARIMA-LSTM model fitted best in the training and validation sets, for the smallest MAPE, RMSE, and MPE.

Conclusions: The overall incidence trend of gonorrhea in mainland China has been on the decline since 2004, with some periods exhibiting an upward trend. The incidence of gonorrhea displays a seasonal distribution, typically peaking in July and December each year. The SARIMA model, LSTM model, and SARIMA-LSTM model can all fit the monthly incidence time series data of gonorrhea in mainland China. However, in terms of predictive performance, the SARIMA-LSTM model outperforms the SARIMA and LSTM models, with the LSTM model surpassing the SARIMA model. This suggests that the SARIMA-LSTM model can serve as a preferred tool for time series analysis, providing evidence for the government to predict trends in gonorrhea incidence. The model's predictions indicate that the incidence of gonorrhea in mainland China will remain at a high level in 2024, necessitating that policymakers implement public health measures in advance to prevent the spread of the disease.

Keywords: Gonorrhea; LSTM; Modeling; SARIMA.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Flowchart of SARIMA model construction and simulation
Fig. 2
Fig. 2
The cell structure of LSTM. The arrow indicates the data flow, where x, s, c, f, i, g, and o denote the input, output, cell state, forget gate, input gate, cell candidate, and output gate in time step t, respectively. σ and tanh denote the sigmoid activation function and the hyperbolic tangent function, which maps the data to (0,1) and (-1,1), respectively. , are vector operators which represent element-wise multiplication and element-wise addition, respectively
Fig. 3
Fig. 3
Monthly TS data of gonorrhea infections from JAN 2004 to AUG 2022 and the TS decomposition. The blue curve in A represents the incidence time series, the red curve represents the long-term trend, the red curve in B represents the time series without the seasonal component (both long-term trend and stochastic component exist), and the blue curve in B represents the stable seasonal component with periodicity 12
Fig. 4
Fig. 4
ACF and PACF of the differenced TS. A and B denote the ACF and PACF of the non-seasonal differential series. C and D denote the ACF and PACF of the seasonal differential series, respectively. The red stem plots represent the sample ACF and PACF values at different lags, and the blue dashed lines indicate the ± 2 times standard deviation interval
Fig. 5
Fig. 5
Comparison of fitted values from various underfitted LSTM models with actual data. A-D correspond to scenarios of 10 hidden units with 50 iterations, 10 hidden units with 500 iterations, 50 hidden units with 50 iterations, and 50 hidden units with 500 iterations, respectively. Here, the blue curve signifies actual incidence data, while the red curve denotes LSTM model-fitted data
Fig. 6
Fig. 6
SARIMA model residuals normality and autocorrelation diagnostics. A shows the frequency distribution of standardized residuals using a histogram. B is the QQ plots of residuals of the SARIMA model, and the red dashed line represents the standard normal distribution. C and D is ACF and PACF of residuals, respectively. The stem plots represent the sample ACF and PACF values at different lags, and the blue dashed lines indicate the ± 2 times standard deviation interval
Fig. 7
Fig. 7
TS data fitting and validation by using SARIMA, LSTM, and SARIMA-LSTM models. In Fig. 7, the blue curves depicted in panels A, and C represent the actual number of cases in China from January 2004 to August 2021. The red curves in panels A, B, and C correspond to the cases fitted by the SARIMA, LSTM, and SARIMA-LSTM models, respectively. The yellow curves in panels A, and C correspond to the cases predicted by the SARIMA, LSTM, and SARIMA-LSTM models, respectively. Panels D, and F display the simulation and prediction residuals for the SARIMA, LSTM, and SARIMA-LSTM models, represented by the blue and yellow curves, respectively
Fig. 8
Fig. 8
Prediction results from September 2022 to August 2024 of SARIMA, LSTM, and SARIMA-LSTM models. The light blue area represents the forecast period, and the red, yellow, and purple dashed curves indicate the prediction results of SARIMA, LSTM, and SARIMA-LSTM models, respectively. The red, yellow, and purple curves indicate the simulating results of the updated SARIMA, LSTM, and SARIMA-LSTM models using all observed data

Similar articles

Cited by

References

    1. World Health Organization . Sexually transmitted infections (STIs) 2022.
    1. Skerlev M, Čulav-Košćak I. Gonorrhea: new challenges. Clin Dermatol. 2014;32(2):275–281. doi: 10.1016/j.clindermatol.2013.08.010. - DOI - PubMed
    1. World Health Organization . Multi-drug resistant gonorrhoea. 2022.
    1. Bu J, Zhou LJ, Xiao X, et al. Epidemiological characteristics of gonorrhea and its influential meteorological factors: a 14-year retrospective assessment in China. Environ Sci Pollut Res Int. 2022;29(23):35434–35442. doi: 10.1007/s11356-021-17823-6. - DOI - PubMed
    1. National Health Commission of the People’s Republic of China. National epidemic profile of statutory infectious diseases in 2021, 2022 http://www.nhc.gov.cn/jkj/s3578/202204/4fd88a291d914abf8f7a91f6333567e1..... [Accessed 13 Mar 2023].