Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Apr;21(4):290-6.
doi: 10.1016/j.annepidem.2010.11.016.

A simple method to generate equal-sized homogenous strata or clusters for population-based sampling

Affiliations

A simple method to generate equal-sized homogenous strata or clusters for population-based sampling

Michael R Elliott. Ann Epidemiol. 2011 Apr.

Abstract

Purpose: Statistical efficiency and cost efficiency can be achieved in population-based samples through stratification and/or clustering. Strata typically combine subgroups of the population that are similar with respect to an outcome. Clusters are often taken from preexisting units, but may be formed to minimize between-cluster variance, or to equalize exposure to a treatment or risk factor. Area probability sample design procedures for the National Children's Study required contiguous strata and clusters that maximized within-stratum and within-cluster homogeneity while maintaining approximately equal size of the strata or clusters. However, there were few methods that allowed such strata or clusters to be constructed under these contiguity and equal size constraints.

Methods: A search algorithm generates equal-size cluster sets that approximately span the space of all possible clusters of equal size. An optimal cluster set is chosen based on analysis of variance and convexity criteria.

Results: The proposed algorithm is used to construct 10 strata based on demographics and air pollution measures in Kent County, MI, following census tract boundaries. A brief simulation study is also conducted.

Conclusions: The proposed algorithm is effective at uncovering underlying clusters from noisy data. It can be used in multi-stage sampling where equal-size strata or clusters are desired.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Plot of total residual variance from ANOVA of three factors on proposed clusters against mean squared distance in miles between census tracts within proposed clusters.
Figure 2
Figure 2
Log-percent of Kent County, MI population (a) African-American, (b) less than high school education (25 and older), (c) in poverty; (d) log-level of respiratory pollutants: by Census tract. Low levels are blue; high levels are pink.
Figure 3
Figure 3
Proposed 10 clusters for Kent County based on 3-level factor score, together with street map (from http://michigan.hometownlocator.com/mi/kent/).
Figure 4
Figure 4
Simulation study using means given in Table 3 with normally distributed errors with mean 0 and variance (a) 1, (b), 10, and (c) 100. First row gives density map of observed data and associated cluster results for 10th percentile of residual variance; second row gives equivalent results for the 90th percentile of residual variance. Results from 50 simulations.

References

    1. Cochran WG. Sampling Techniques. 3. New York: Wiley; 1977.
    1. LaVarnway GT. An introduction to CART: Classification and regression trees. In: Wegman Edward J., editor. Computer Science and Statistics: Proceedings of the 20th Symposium on the Interface. Alexandria, VA: American Statistical Association; 1988. pp. 298–301.
    1. MacQueen JB. Proceedings of the Fifth Symposium on Match, Statistics, and Probability. Vol. 1. Berkeley, CA: University of California Press; 1967. Some methods for the classification and analysis of multivariate observations; pp. 281–297.
    1. McLachen G, Peel D. Finite mixture models. New York: Wiley; 2000.
    1. Cantwell PJ. Equal Characteristic Clustering. Proceedings of the American Statistical Association, Survey Methods Section. Alexandria, VA: American Statistical Association; 1990. pp. 231–236.