Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015;16 Suppl 17(Suppl 17):S4.
doi: 10.1186/1471-2105-16-S17-S4. Epub 2015 Dec 7.

ORBiT: Oak Ridge biosurveillance toolkit for public health dynamics

ORBiT: Oak Ridge biosurveillance toolkit for public health dynamics

Arvind Ramanathan et al. BMC Bioinformatics. 2015.

Abstract

Background: The digitization of health-related information through electronic health records (EHR) and electronic healthcare reimbursement claims and the continued growth of self-reported health information through social media provides both tremendous opportunities and challenges in developing effective biosurveillance tools. With novel emerging infectious diseases being reported across different parts of the world, there is a need to build systems that can track, monitor and report such events in a timely manner. Further, it is also important to identify susceptible geographic regions and populations where emerging diseases may have a significant impact.

Methods: In this paper, we present an overview of Oak Ridge Biosurveillance Toolkit (ORBiT), which we have developed specifically to address data analytic challenges in the realm of public health surveillance. In particular, ORBiT provides an extensible environment to pull together diverse, large-scale datasets and analyze them to identify spatial and temporal patterns for various biosurveillance-related tasks.

Results: We demonstrate the utility of ORBiT in automatically extracting a small number of spatial and temporal patterns during the 2009-2010 pandemic H1N1 flu season using claims data. These patterns provide quantitative insights into the dynamics of how the pandemic flu spread across different parts of the country. We discovered that the claims data exhibits multi-scale patterns from which we could identify a small number of states in the United States (US) that act as "bridge regions" contributing to one or more specific influenza spread patterns. Similar to previous studies, the patterns show that the south-eastern regions of the US were widely affected by the H1N1 flu pandemic. Several of these south-eastern states act as bridge regions, which connect the north-east and central US in terms of flu occurrences.

Conclusions: These quantitative insights show how the claims data combined with novel analytical techniques can provide important information to decision makers when an epidemic spreads throughout the country. Taken together ORBiT provides a scalable and extensible platform for public health surveillance.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Temporal trends of ILI incidence from IMS Health claims, CDC ILINet and Google Flu Trends (GFT) during the 2009-2010 pandemic flu show significant similarities. The total incidence of H1N1 pandemic as provided by GFT (blue line) and CDC (red dots) are plotted together with IMS claims data (black line). Note here that we used the strict definition of the flu (ICD9 codes: 486XX and 488XX). The temporal trends for the entire US are plotted in the center, followed by the 10 Human and Health Services (HHS) Regions shown around the US (HHS-I to HHS-X). In all the cases, the agreement between IMS claims data, GFT and CDC ILINet data is quantified by the correlation coefficient, depicted on the side of each panel. The numbers at the right hand side of every panel represent the correlation coefficient between the IMS claims and GFT data (top) and the IMS claims with ILINet data (bottom) respectively. These numbers represent all the data from the 52 weeks collected instead of measuring across the time segments for which CDC ILINet data was available. Note that CDC ILINet data has some missing values, removing these segments from our analysis actually improves the correlations (see Main Text for discussion). For HHS-IX and HHS-X, the CDC ILINet data was not fully available at the time of download and hence we have not shown the correlation values.
Figure 2
Figure 2
Summary of non-negative matrix factorization (NMF) applied to ILI diagnostic claims claims data. (A) Reconstruction error or the fraction of unexplained variance for PCA (red) and NMF (black) versus the subspace s selected. (B) Change in reconstruction error for PCA and NMF as compared to the change in reconstruction error for PCA performed on a scrambled version of the input matrix A. PCAscram shown in gray line is used to estimate the cut-off number of dimensions, beyond which the dimensionality reduction method explains only noise within the dataset. For our analysis, s beyond 12 is only explaining noise in the data, as is evident from the intersection between the gray and black/red lines.
Figure 3
Figure 3
Five distinct temporal patterns govern how the pandemic flu spread throughout the US. The normalized temporal amplitude is plotted against the total number of days (Apr 1, 2009-Mar 31, 2010). Observe the distinct lag in each of the five patterns, with successive Hi indicating the peak shift occurring towards the left (indicted by a gray arrow). These patterns summarize the different peaks during the H1N1 pandemic. Notably, H1, H4 and H5 capture the late, middle and early H1N1 pandemic peaks occurring within the entire country.
Figure 4
Figure 4
Multi-scale spatial patterns of H1N1 influenza occurrence in the US. Each of the spatial pattern W discovered from NMF can examine how the flu spread throughout the US (left hand panels). The nation wide panels depict how W1 pattern is widespread throughout the US followed by progressively moving down south (W4 ). The spatial pattern W5 depicts flu prevalence only within large metropolitan areas and southern Florida. One can focus further into state-wide patterns (middle panel) and examine how ILI-patterns affect the state of Tennessee and towards specific metropolitan areas (e.g., Memphis in Tennessee, right most panels) and capture minor variations in the ILI-patterns according to different zip-codes. These differences also allow one to identify bridge regions (highlighted by red and magenta circles) that show more than two ILI-patterns in the same zip code. These analyses can be further extended out towards the state and nation-wide areas.
Figure 5
Figure 5
A small number of regions within the US act as bridge regions for the 2009-2010 H1N1-flu season. Within every state, we quantify the extent to which the individual spatial patterns are dominant using a pie-chart representation. The colors represent respective spatial patterns (W1...5), as highlighted in the legend. In the pie-chart, a line in the middle points out the 50% cut-off for a particular flu pattern and is used as a guide to identify dominant patterns. For the individual HHS regions shown below, we can see a dominant pattern, within the individual states, (for e.g., MA, CT, MT, CO, MS) more than one pattern dominates indicating the complexity of how the H1N1 flu spread within these regions. Note that the patterns also correspond to the time when the flu peaked in these individual regions and hence such patterns are instructive in visually interpreting how the different spread patterns affected an individual state.

Similar articles

Cited by

References

    1. Jamison DT, Breman JG, Measham AR, Alleyne G, Claeson M, Evans DB, Disease Control Priorities in Developing Countries. World Bank, Washington DC; 2006. - PubMed
    1. Gatherer D. The 2014 Ebola virus disease outbreak in West Africa. J Gen Virol. 2014;95(Pt 8):1619–1624. - PubMed
    1. Khan K, Sears J, Hu VW, Brownstein JS, Hay S, Kossowsky D, Potential for the International Spread of Middle East Respiratory Syndrome in Association with Mass Gatherings in Saudi Arabia. PLoS Curr. 2013. - PMC - PubMed
    1. Bradley CA, Rolka H, Walker DW, Loonsk J. BioSense: implementation of a national early event detection and situational awareness system. Biosense Morb Mor Wkly Rep. 2005;54(Suppl):11–19. - PubMed
    1. Mawudeku A, Blench M, Boily L, St. John R, Andraghetti R, Ruben M. The Global Public Health Intelligence Network. John Wiley and Sons; 2013. pp. 457–469.

Publication types

LinkOut - more resources