Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Jan-Feb;14(1):76-85.
doi: 10.1197/jamia.M2178. Epub 2006 Oct 26.

Finding leading indicators for disease outbreaks: filtering, cross-correlation, and caveats

Affiliations

Finding leading indicators for disease outbreaks: filtering, cross-correlation, and caveats

Ronald M Bloom et al. J Am Med Inform Assoc. 2007 Jan-Feb.

Abstract

Bioterrorism and emerging infectious diseases such as influenza have spurred research into rapid outbreak detection. One primary thrust of this research has been to identify data sources that provide early indication of a disease outbreak by being leading indicators relative to other established data sources. Researchers tend to rely on the sample cross-correlation function (CCF) to quantify the association between two data sources. There has been, however, little consideration by medical informatics researchers of the influence of methodological choices on the ability of the CCF to identify a lead-lag relationship between time series. We draw on experience from the econometric and environmental health communities, and we use simulation to demonstrate that the sample CCF is highly prone to bias. Specifically, long-scale phenomena tend to overwhelm the CCF, obscuring phenomena at shorter wave lengths. Researchers seeking lead-lag relationships in surveillance data must therefore stipulate the scale length of the features of interest (e.g., short-scale spikes versus long-scale seasonal fluctuations) and then filter the data appropriately--to diminish the influence of other features, which may mask the features of interest. Otherwise, conclusions drawn from the sample CCF of bi-variate time-series data will inevitably be ambiguous and often altogether misleading.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Typical aggregate diagnostic and OTC sales data: Summer minimum through Winter Maximum.
Figure 2
Figure 2
Composite data and raw cross-correlation; dominated by relations on long scale.
Figure 3
Figure 3
Cross-correlation of composite signals after first differencing; peak at −6 is now manifest, maximum at +4 due to long-term trends still present.
Figure 4
Figure 4
Lag Correlation after second differencing completely eliminates the contribution due to long-scale fluctuations, and fully reveals relationship at −6 days due to short scale components.
Figure 5
Figure 5
(A to F from top to bottom) (A) Two simulated aggregated clinical respiratory data series; multiplicative model with seasonal and 7-day cycle and independent log-normal random components; with a 5-day “outbreak” signature in both series, lagged by 10 days. (B) 7-day cyclic component eliminated by 7-day moving average. (C) 7-day cyclic component eliminated by 7-day lagged difference. (D) Cross-correlation function of the untreated pair of data series (A). No evidence of the outbreak at lag 10. (E) Cross-correlation function (negative-log complement scale) of the 7-day moving average data series (B). CCF dominated by the common long-scale seasonal component. No evidence of outbreak at lag 10. (F) CCF of the 7-day lag difference data series. Shows evidence of the alignment at lag 10 due to the short-scale outbreak signatures.

References

    1. Buckeridge DL, Burkom H, Campbell M, Hogan WR, Moore AW. Algorithms for rapid outbreak detection: a research synthesis J Biomed Inform 2005;38:99-113. - PubMed
    1. Davies GR, Finch RG. Sales of over-the-counter remedies as an early warning system for winter bed crises Clin Microbiol Infect 2003;9:858-863. - PubMed
    1. Hogan WR, Tsui FC, Ivanov O, et al. Detection of pediatric respiratory and diarrheal outbreaks from sales of over-the-counter electrolyte products J Am Med Inform Assoc 2003;10:555-562. - PMC - PubMed
    1. Najmi AH, Magruder SF. Estimation of hospital emergency room data using OTC pharmaceutical sales and least mean square filters BMC Med Inform Decis Mak 2004;4:5. - PMC - PubMed
    1. Lazarus R, Kleinman K, Dashevsky I, et al. Use of automated ambulatory-care encounter records for detection of acute illness clusters, including potential bioterrorism events Emerg Infect Dis 2002;8:753-760. - PMC - PubMed

Publication types