Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Nov 6;19(11):e370.
doi: 10.2196/jmir.7486.

Subregional Nowcasts of Seasonal Influenza Using Search Trends

Affiliations

Subregional Nowcasts of Seasonal Influenza Using Search Trends

Sasikiran Kandula et al. J Med Internet Res. .

Abstract

Background: Limiting the adverse effects of seasonal influenza outbreaks at state or city level requires close monitoring of localized outbreaks and reliable forecasts of their progression. Whereas forecasting models for influenza or influenza-like illness (ILI) are becoming increasingly available, their applicability to localized outbreaks is limited by the nonavailability of real-time observations of the current outbreak state at local scales. Surveillance data collected by various health departments are widely accepted as the reference standard for estimating the state of outbreaks, and in the absence of surveillance data, nowcast proxies built using Web-based activities such as search engine queries, tweets, and access of health-related webpages can be useful. Nowcast estimates of state and municipal ILI were previously published by Google Flu Trends (GFT); however, validations of these estimates were seldom reported.

Objective: The aim of this study was to develop and validate models to nowcast ILI at subregional geographic scales.

Methods: We built nowcast models based on autoregressive (autoregressive integrated moving average; ARIMA) and supervised regression methods (Random forests) at the US state level using regional weighted ILI and Web-based search activity derived from Google's Extended Trends application programming interface. We validated the performance of these methods using actual surveillance data for the 50 states across six seasons. We also built state-level nowcast models using state-level estimates of ILI and compared the accuracy of these estimates with the estimates of the regional models extrapolated to the state level and with the nowcast estimates published by GFT.

Results: Models built using regional ILI extrapolated to state level had a median correlation of 0.84 (interquartile range: 0.74-0.91) and a median root mean square error (RMSE) of 1.01 (IQR: 0.74-1.50), with noticeable variability across seasons and by state population size. Model forms that hypothesize the availability of timely state-level surveillance data show significantly lower errors of 0.83 (0.55-0.23). Compared with GFT, the latter model forms have lower errors but also lower correlation.

Conclusions: These results suggest that the proposed methods may be an alternative to the discontinued GFT and that further improvements in the quality of subregional nowcasts may require increased access to more finely resolved surveillance data.

Keywords: classification and regression trees; human influenza; infodemiology; infoveillance; nowcasts; surveillance.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: JS declares partial ownership of SK Analytics. SK was a contractor for SK Analytics.

Figures

Figure 1
Figure 1
Autoregressive integrated moving average (ARIMA) formulation.
Figure 2
Figure 2
Formulation for two error measures: root mean square error (RMSE) and mean absolute proportion error (MAPE).
Figure 3
Figure 3
Top 20 features by importance as determined by random forest models built at regional level. The dot and whiskers in red show the median and interquartile range (IQR), respectively, whereas the blue point is the mean. The label shows the percentage of models in which the feature was used (n=3130). ar refers to the autoregressive integrated moving average (ARIMA) component. Features prefixed by ENT are entities identified using Freebase.
Figure 4
Figure 4
Measures observed with the different model forms A: Pearson correlation coefficient (COR); B: Root mean square error (RMSE); and C: Mean absolute percentage error (MAPE). Left: The box and whiskers show the median, interquartile range (IQR), and extrema (1.5×IQR) for each model form. The colored regions are violin plots showing probability density. Right: Heat map of the distribution of relative ranks of the models; more frequent ranks are darker.
Figure 5
Figure 5
Pairwise plots for the model forms on the three measures forms A: Pearson correlation coefficient (COR); B: Root mean square error (RMSE); and C: Mean absolute percentage error (MAPE). The subpanels along the diagonal show density of the measure for the model form. Subpanels in the lower triangle are scatter plots (n=300) denoting a state-season. Points on or close to the black line (y=x) are state-seasons where the pair of model forms have similar measures (correlation or error). Subpanels in the upper triangle are heat maps of the counts of points in each two-dimensional (2D) grid of the plot area (low counts in yellow, high in red). For example, to compare the correlations of RRS and SS0, see the scatter plot in (5,4) or heat map in (4,5) of A.

Similar articles

Cited by

References

    1. WHO. [2017-09-04]. Influenza (seasonal) fact sheet http://www.who.int/mediacentre/factsheets/fs211/en/ 6tEctpQxS
    1. WHO. [2017-09-04]. Influenza vaccines http://www.who.int/biologicals/vaccines/influenza/en/ 6tEcxKmG9
    1. Xu J, Murphy SL, Kochanek KD, Bastian BA. Deaths: final data for 2013. Natl Vital Stat Rep. 2016;64(2):1–119. https://www.cdc.gov/nchs/data/nvsr/nvsr64/nvsr64_02.pdf - PubMed
    1. CDC. [2017-09-04]. Overview of influenza surveillance in the United States http://www.cdc.gov/flu/weekly/overview.htm 6tEd2Ix1L.
    1. CDC. [2017-09-04]. FluView interactive https://gis.cdc.gov/grasp/fluview/fluportaldashboard.html 6tEd6aocm.

Publication types

MeSH terms