Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 Jul 18:5:22.
doi: 10.1186/1471-2288-5-22.

Oblique decision trees for spatial pattern detection: optimal algorithm and application to malaria risk

Affiliations

Oblique decision trees for spatial pattern detection: optimal algorithm and application to malaria risk

Jean Gaudart et al. BMC Med Res Methodol. .

Abstract

Background: In order to detect potential disease clusters where a putative source cannot be specified, classical procedures scan the geographical area with circular windows through a specified grid imposed to the map. However, the choice of the windows' shapes, sizes and centers is critical and different choices may not provide exactly the same results. The aim of our work was to use an Oblique Decision Tree model (ODT) which provides potential clusters without pre-specifying shapes, sizes or centers. For this purpose, we have developed an ODT-algorithm to find an oblique partition of the space defined by the geographic coordinates.

Methods: ODT is based on the classification and regression tree (CART). As CART finds out rectangular partitions of the covariate space, ODT provides oblique partitions maximizing the interclass variance of the independent variable. Since it is a NP-Hard problem in RN, classical ODT-algorithms use evolutionary procedures or heuristics. We have developed an optimal ODT-algorithm in R2, based on the directions defined by each couple of point locations. This partition provided potential clusters which can be tested with Monte-Carlo inference. We applied the ODT-model to a dataset in order to identify potential high risk clusters of malaria in a village in Western Africa during the dry season. The ODT results were compared with those of the Kulldorff' s SaTScan.

Results: The ODT procedure provided four classes of risk of infection. In the first high risk class 60%, 95% confidence interval (CI95%) [52.22-67.55], of the children was infected. Monte-Carlo inference showed that the spatial pattern issued from the ODT-model was significant (p < 0.0001). Satscan results yielded one significant cluster where the risk of disease was high with an infectious rate of 54.21%, CI95% [47.51-60.75]. Obviously, his center was located within the first high risk ODT class. Both procedures provided similar results identifying a high risk cluster in the western part of the village where a mosquito breeding point was located.

Conclusion: ODT-models improve the classical scanning procedures by detecting potential disease clusters independently of any specification of the shapes, sizes or centers of the clusters.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Construction of the critical angle θij of the direction u. - the geographical space is represented by the plane with an orthogonal basis {x, y} and a fixed origin O; - u is a direction perpendicular to the splitting direction formula image; - Mi and Mj are two point locations in the geographical space.
Figure 2
Figure 2
Passage through the critical direction u, from sector 1 to sector 2. - u is a direction perpendicular to the splitting direction formula image; Mi and Mj are two point locations in the geographical space; - Change in the order of the projected coordinates on the u' and u" directions; - u' and u" are directions with intermediate angles, belonging respectively to sector 1 and sector 2; - u'i, u'j, u"i, and u"j are the projected coordinates of points Mi and Mj: u'i > u'j and u"i <u"j.
Figure 3
Figure 3
Oblique Decision Tree for spatial partitioning. The geographical area is splited into 6 partitions. Nloc: number of locations belonging to each partition; n: total number of children of each partition; R: infectious rate; θ: critical angle for each split; Vic: interclasses variance for each split.
Figure 5
Figure 5
The village of Bancoumana. - The circle S1 refers to the significative cluster provided by the Kulldorff's SaTScan. - The strait lines are the 3 splits resulting from the ODT-model, providing 4 partitions P1, P2, P3 and P4. - The bold grey line represents the Niger river. - Each location is represented by its own risk value. The scale of risks is discretized in 6 equal sized intervals.
Figure 4
Figure 4
Empirical distribution of the explained variability rate Rv. The distribution was provided by Monte Carlo procedure (999 simulated sets and one observed set).

Similar articles

Cited by

References

    1. Kulldorff M, Feuer EJ, Miller BA, Freeman LS. Breast cancer in northeastern United States: a geographical analysis. Am J Epidemiol. 1997;146:161–170. - PubMed
    1. Bithell JF. The choice of test for detecting raised disease risk near a point source. Stat Med. 1995;14:2309–2322. - PubMed
    1. Cuzick J, Edwards R. Spatial clustering for inhomogeneous populations. J R Stat Soc [Ser B] 1990;52:73–104.
    1. Tango T. A class of tests for detecting 'general' and 'focused' clustering of rare diseases. Stat Med. 1995;14:2323–2334. - PubMed
    1. Diggle PJ, Morris S, Elliott P, Shaddick G. Regression modelling of disease risk in relation to point sources. J R Stat Soc [Ser A] 1997;160:491–505. doi: 10.1111/1467-985X.00076. - DOI

Publication types