Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jan 12:10:1.
doi: 10.1186/1472-6947-10-1.

A framework for enhancing spatial and temporal granularity in report-based health surveillance systems

Affiliations

A framework for enhancing spatial and temporal granularity in report-based health surveillance systems

Hutchatai Chanlekha et al. BMC Med Inform Decis Mak. .

Abstract

Background: Current public concern over the spread of infectious diseases has underscored the importance of health surveillance systems for the speedy detection of disease outbreaks. Several international report-based monitoring systems have been developed, including GPHIN, Argus, HealthMap, and BioCaster. A vital feature of these report-based systems is the geo-temporal encoding of outbreak-related textual data. Until now, automated systems have tended to use an ad-hoc strategy for processing geo-temporal information, normally involving the detection of locations that match pre-determined criteria, and the use of document publication dates as a proxy for disease event dates. Although these strategies appear to be effective enough for reporting events at the country and province levels, they may be less effective at discovering geo-temporal information at more detailed levels of granularity. In order to improve the capabilities of current Web-based health surveillance systems, we introduce the design for a novel scheme called spatiotemporal zoning.

Method: The proposed scheme classifies news articles into zones according to the spatiotemporal characteristics of their content. In order to study the reliability of the annotation scheme, we analyzed the inter-annotator agreements on a group of human annotators for over 1000 reported events. Qualitative and quantitative evaluation is made on the results including the kappa and percentage agreement.

Results: The reliability evaluation of our scheme yielded very promising inter-annotator agreement, more than a 0.9 kappa and a 0.9 percentage agreement for event type annotation and temporal attributes annotation, respectively, with a slight degradation for the spatial attribute. However, for events indicating an outbreak situation, the annotators usually had inter-annotator agreements with the lowest granularity location.

Conclusions: We developed and evaluated a novel spatiotemporal zoning annotation scheme. The results of the scheme evaluation indicate that our annotated corpus and the proposed annotation scheme are reliable and could be effectively used for developing an automatic system. Given the current advances in natural language processing techniques, including the availability of language resources and tools, we believe that a reliable automatic spatiotemporal zoning system can be achieved. In the next stage of this work, we plan to develop an automatic zoning system and evaluate its usability within an operational health surveillance system.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Various locations with different roles in outbreak news reports. The example was captured from news article published on CBCnews [51]. The location names occur in the news reports are not always the location of the outbreak. In the text captures illustrated in the figure, Japan, Caribbean countries, and Africa are referred to as a location where HTLV-1 usually occurs, while South Africa and U.S. are the countries that provide the medical assistance to the affected country.
Figure 2
Figure 2
Text capture of spatiotemporal zoning in a news report. The example was captured from news published in WHO website. Text is marked-up with spatiotemporal zone according to the annotation guideline. The first zone is report zone consists of one event-predicate, which is "reported". This event-predicate event occurred in Yei County, Central Equatorial, in Sudan from 1 September to 8 November 2006. These spatial and temporal information are represented in the zone's Location_ID, STime, and ETime attributes, respectively. The second zone also consists of one event-predicate, which is crossed. This event-predicate is annotated as occurred in Yei County, in the last week of October 2006, according to information available in the news report.
Figure 3
Figure 3
Zone generation process. This figure illustrates the algorithm for zone boundary generation. The boundary marked with the square brackets in the text capture is the example of the output from the zone boundary generation process.
Figure 4
Figure 4
Distribution of the number of sentences, including partial sentences. This chart represents the distribution of the number of sentences in our corpus. In corpus set 1, most of the news articles contain 6 to 20 sentences, while in corpus set 2, the highest proportion are the articles that contain 6-10 sentences.
Figure 5
Figure 5
Distribution of the number of event-predicates to be annotated. This chart shows the distribution of the number of event-predicates in each document in our corpus. The majority of the documents in overall consist of 6 to 25 event-predicates.
Figure 6
Figure 6
Distribution of news articles in the corpus by date of publication. This chart shows the distribution of news articles in our corpus in terms of the publication date. The corpus consists of news articles whose publication dates range from 1996 to 2007. However, the majority of the news articles were published from the middle of 2005 to the end of 2006.
Figure 7
Figure 7
Distribution of outbreak events reported in our corpus, classified by outbreak-affected country. This figure represents the outbreak affected countries reported in news articles in our corpus. The map illustration was created by using Google Maps API [52] for the visualized purpose of location distribution. The chart in the top-left corner of the figure shows the number of documents that report the situation in each country. Note that, in our corpus, although most of the articles reported the outbreak within one country, there are also some documents that reported the outbreak situations in many countries.
Figure 8
Figure 8
Example of co-referring of event-predicates. This example was captured from the news article published on Nation Channel [53]. The captured text shown in the figure exemplifies a situation where multiple event-predicates refer to the same real-world event. In the text example, the phrase "Medical Service director-general Dr. Chatri Banchuen said", "Chatri added", "hospital director Dr. Jessa Chokedumrongsuk said", "hospital director Dr. Vinit Pua-pradit said", and "the doctor claimed" are parts of the event previously mentioned in the clause "doctors at several hospitals said yesterday".

Similar articles

Cited by

References

    1. World Health Organization. International Health Regulations (2005) 2. World Health Organization; 2008.
    1. Lewis MD, Pavlin JA, Mansfield JL, O'Brien S, Boomsma LG, Elbert Y, Kelley PW. Disease outbreak detection system using syndromic data in the greater Washington DC area. American Journal of Preventive Medicine. 2002;23(3):108–186. doi: 10.1016/S0749-3797(02)00490-7. - DOI - PubMed
    1. Tsui F-C, Espino JU, Dato VM, Gesteland PH, Hutman J, Wagner MM. Technical Description of RODS: A Real-time Public Health Surveillance System. Journal of American Medical Informatics Association. 2003;10(5):399–408. doi: 10.1197/jamia.M1345. - DOI - PMC - PubMed
    1. Heymann DL, Rodier GR. Hot spots in a wired world: WHO surveillance of emerging and re-emerging infectious diseases. The Lancet Infectious Diseases. 2001;1(5):345–353. doi: 10.1016/S1473-3099(01)00148-7. - DOI - PubMed
    1. Brownstein JS, Freifeld CC. HealthMap: the development of automated real-time internet surveillance for epidemic intelligence. Eurosurveillance. 2007;12(48) - PubMed

Publication types

LinkOut - more resources