Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jan:147:e194.
doi: 10.1017/S095026881900075X.

Application of a long short-term memory neural network: a burgeoning method of deep learning in forecasting HIV incidence in Guangxi, China

Affiliations

Application of a long short-term memory neural network: a burgeoning method of deep learning in forecasting HIV incidence in Guangxi, China

G Wang et al. Epidemiol Infect. 2019 Jan.

Abstract

Guangxi, a province in southwestern China, has the second highest reported number of HIV/AIDS cases in China. This study aimed to develop an accurate and effective model to describe the tendency of HIV and to predict its incidence in Guangxi. HIV incidence data of Guangxi from 2005 to 2016 were obtained from the database of the Chinese Center for Disease Control and Prevention. Long short-term memory (LSTM) neural network models, autoregressive integrated moving average (ARIMA) models, generalised regression neural network (GRNN) models and exponential smoothing (ES) were used to fit the incidence data. Data from 2015 and 2016 were used to validate the most suitable models. The model performances were evaluated by evaluating metrics, including mean square error (MSE), root mean square error, mean absolute error and mean absolute percentage error. The LSTM model had the lowest MSE when the N value (time step) was 12. The most appropriate ARIMA models for incidence in 2015 and 2016 were ARIMA (1, 1, 2) (0, 1, 2)12 and ARIMA (2, 1, 0) (1, 1, 2)12, respectively. The accuracy of GRNN and ES models in forecasting HIV incidence in Guangxi was relatively poor. Four performance metrics of the LSTM model were all lower than the ARIMA, GRNN and ES models. The LSTM model was more effective than other time-series models and is important for the monitoring and control of local HIV epidemics.

Keywords: ARIMA model; HIV; LSTM model; incidence; prediction.

PubMed Disclaimer

Conflict of interest statement

None.

Figures

Fig. 1.
Fig. 1.
Diagram of LSTM neural network pattern. Input gate (it) determines which information needs to be updated in the unit state; the forgetting gate (ft) controls information which needs to be discarded from the unit state; then input gate and a vector formula image are created by Tanh to determine which new information is stored in the unit state to update the old unit state, and turn into the new unit state (ct). Finally, cell state information is filtered with the output gate (ot) to update the hidden state (ht), which is the output of the LSTM cell.
Fig. 2.
Fig. 2.
Monthly incidence of HIV in Guangxi, China (from January 2005 to December 2015). According to the trend section, it can be found that the incidence of HIV shows seasonal tendency (s = 12). From 2005 to 2011, the HIV incidence in Guangxi was increasing slowly, and the epidemic situation in 2011–2016 showed a seasonal slow decline.
Fig. 3.
Fig. 3.
The MSE of LSTM models with different N values using HIV incidence in 2015 and 2016. MSE, mean square error; N: the number of input to the LSTM model. The yellow line means N value and corresponding MSE in 2015, while the purple line means N and MSE in 2016. As can be seen from the figure, when the N was 12, the model had the minimum MSE in 2015 and 2016.
Fig. 4.
Fig. 4.
The forecasting curves of the optimal LSTM and other models as well as the actual HIV incidence series. Comparison of LSTM model and other models. LSTM, the long short-term memory neural network model; ARIMA, the autoregressive integrated moving average model. The black line means the actual data, the blue dashed line means the predictive data via the LSTM model, the red dashed line means the predictive value via the ARIMA model, the yellow dashed line means the predictive value via the SES model, while the green dashed line means the predictive value via the GRNN model. Compared with ARIMA, SES and GRNN, the predicted value of LSTM was closer to the actual value.

Similar articles

Cited by

References

    1. Xing J et al. (2014) HIV/AIDS epidemic among older adults in China during 2005–2012: results from trend and spatial analysis. Clinical Infectious Diseases 59, 53–60. - PMC - PubMed
    1. China CDC et al. (2018) Update on the AIDS/STD epidemic in China in January, 2018. Chinese Journal of AIDS & STD 24, 219.
    1. Zhang C et al. (2014) Prevalence of HIV, syphilis, and HCV infection and associated risk factors among male clients of low-paying female sex workers in a rural county of Guangxi, China: a cross-sectional study. Sexually Transmitted Infections 90, 230–236. - PMC - PubMed
    1. Willis SJ et al. (2018) Chronic hepatitis C virus infection and subsequent HIV viral load among women with HIV initiating antiretroviral therapy. AIDS (London, England) 32, 653–661. - PMC - PubMed
    1. WHOCSR (2004) WHO Recommended Surveillance Standards, 2nd Edn. WHO; Available at http://www.who.int/csr/resources/publications/surveillance/whocdscsrisr9... (Accessed 17 June 2012).

Publication types