Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Aug 25;50(14):2999-3029.
doi: 10.1080/02664763.2022.2112557. eCollection 2023.

Hot-spots detection in count data by Poisson assisted smooth sparse tensor decomposition

Affiliations

Hot-spots detection in count data by Poisson assisted smooth sparse tensor decomposition

Yujie Zhao et al. J Appl Stat. .

Abstract

Count data occur widely in many bio-surveillance and healthcare applications, e.g. the numbers of new patients of different types of infectious diseases from different cities/counties/states repeatedly over time, say, daily/weekly/monthly. For this type of count data, one important task is the quick detection and localization of hot-spots in terms of unusual infectious rates so that we can respond appropriately. In this paper, we develop a method called Poisson assisted Smooth Sparse Tensor Decomposition (PoSSTenD), which not only detect when hot-spots occur but also localize where hot-spots occur. The main idea of our proposed PoSSTenD method is articulated as follows. First, we represent the observed count data as a three-dimensional tensor including (1) a spatial dimension for location patterns, e.g. different cities/countries/states; (2) a temporal domain for time patterns, e.g. daily/weekly/monthly; (3) a categorical dimension for different types of data sources, e.g. different types of diseases. Second, we fit this tensor into a Poisson regression model, and then we further decompose the infectious rate into two components: smooth global trend and local hot-spots. Third, we detect when hot-spots occur by building a cumulative sum (CUSUM) control chart and localize where hot-spots occur by their LASSO-type sparse estimation. The usefulness of our proposed methodology is validated through numerical simulation studies and a real-world dataset, which records the annual number of 10 different infectious diseases from 1993 to 2018 for 49 mainland states in the United States.

Keywords: CUSUM; Hot-spots detection; Poisson regression; spatio-temporal model; tensor decomposition.

PubMed Disclaimer

Conflict of interest statement

No potential conflict of interest was reported by the author(s).

Figures

Figure 1.
Figure 1.
Data visualization of our motivating dataset. (a) bar plot of 10 diseases. (b) map of 49 states. (c) time series of 26 years.
Figure 2.
Figure 2.
The pipeline of our algorithm to minimize F(θm,θh) with respect to θm,θh.
Figure 3.
Figure 3.
Population and estimate in Alabama and Georgia during 1993–2018. (a) Alabama. (b) Oregon.
Figure 4.
Figure 4.
The ARL1 plot of our proposed PoSSTenD method, and the three benchmark methods, i.e. YPS-SSD method, ZQ-Lasso method and DBS-PCA method. (a) population with an increasing trend. (b) population with a decreasing trend.
Figure 5.
Figure 5.
Comparison of true hot-spots and hot-spots detected by our PoSSTenD method and YPS-SSD method under the decreasing population size and large hot-spots δ=0.2. (a.1) true. (a.2) PoSSTenD. (a.3) NMC-scan-stat & (a.4) YPS-SSD (b.1) true. (b.2) PoSSTenD. (b.3) NMC-scan-stat. (b.4) YPS-SSD (c.1) true. (c.2) PoSSTenD. (c.3) NMC-scan-stat. (c.4) YPS-SSD (d.1) true. (d.2) PoSSTenD. (d.3) NMC-scan-stat. (d.4) YPS-SSD (e.1) true & (e.2) PoSSTenD. (e.3) NMC-scan-stat. (e.4) YPS-SSD.
Figure 6.
Figure 6.
Control chart of our proposed method.
Figure 7.
Figure 7.
Hot-spots detection result of mumps, syphilis and pertussis in 2017 by our proposed PoSSTenD method (second column), NMC-scan-stat (third column) and YPS-SSD method (last column). The first column is the raw data of the number of infected people of mumps, syphilis and pertussis in 2017. (a.1) 2017 raw data. (a.2) PoSSTend. (a.3) NMC-scan-stat. (a.4) YPS-SSD (b.1) 2017 raw data. (b.2) PoSSTend. (b.3) NMC-scan-stat. (b.4) YPS-SSD (c.1) 2017 raw data. (c.2) PoSSTend. (c.3) NMC-scan-stat. (c.4) YPS-SSD.

Similar articles

Cited by

References

    1. Beck A. and Teboulle M., A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM. J. Imaging. Sci. 2 (2009), pp. 183–202.
    1. Brègman L.M., Relaxation method for finding a common point of convex sets and its application to optimization problems, Doklady Akademii Nauk. 171 (1966), pp. 1019–1022. Russian Academy of Sciences.
    1. Chen J. and Fang F., Semiparametric likelihood for estimating equations with non-ignorable non-response by non-response instrument, J. Nonparametr. Stat. 31 (2019), pp. 420–434.
    1. Chen J., Fang F., and Xiao Z., Semiparametric inference for estimating equations with nonignorably missing covariates, J. Nonparametr. Stat. 30 (2018a), pp. 796–812.
    1. Chen J., Shao J., and Fang F., Instrument search in pseudo-likelihood approach for nonignorable nonresponse, Ann. Inst. Stat. Math. 73 (2021a), pp. 519–533.

LinkOut - more resources