Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jun;17(2):1722-1739.
doi: 10.1214/22-aoas1692. Epub 2023 May 1.

INTEGRATING MULTIPLE BUILT ENVIRONMENT DATA SOURCES

Affiliations

INTEGRATING MULTIPLE BUILT ENVIRONMENT DATA SOURCES

Jung Yeon Won et al. Ann Appl Stat. 2023 Jun.

Abstract

Studies examining the contribution of the built environment to health often rely on commercial data sources to derive exposure measures such as the number of specific food outlets in study participants' neighborhoods. Data on the location of community amenities (e.g., food outlets) can be collected from multiple sources. However, these commercial listings are known to have ascertainment errors and thus provide conflicting claims about the number and location of amenities. We propose a method that integrates exposure measures from different databases while accounting for ascertainment errors and obtains unbiased health effects of latent exposure. We frame the problem of conflicting exposure measures as a problem of two contingency tables with partially known margins, with the entries of the tables modeled using a multinomial distribution. Available estimates of source quality were embedded in a joint model for observed exposure counts, latent exposures, and health outcomes. Simulations show that our modeling framework yields substantially improved inferences regarding the health effects. We used the proposed method to estimate the association between children's body mass index (BMi) and the concentration of food outlets near their schools when both the NETS and Reference USA databases are available.

Keywords: Built-environment; Commercial business lists; Count exposure; Data integration; Dirichlet process mixture model; Measurement error.

PubMed Disclaimer

Figures

Fig 1:
Fig 1:
Distributions of exposure to convenience stores and grocery stores from different sources. Exposures are computed by counting the number of stores within 1 mile buffer around California public schools. Means and standard deviations are presented in the legend.
Fig 2:
Fig 2:
Percent bias of slope (βx) estimates from naïve regression models (“OLS, source 1”, “OLS, source 2”, “OLS, average”) under the different X distribution, the ratio of sensitivity to PPV, and the magnitude of agreement statistics. Values for realistic (labeled as R in the legend) sensitivity and PPV are (a)sen1=0.37, sen2=0.5, ppv1=0.48, ppv2=0.62, and (b)sen1=0.48, sen2=0.62, ppv1=0.37, ppv2=0.5.
Fig 3:
Fig 3:
Comparison of naïve OLS estimators for βx and posterior mean of βx from the proposed model under different misspecification. For naïve approaches, “OLS, InfoUSA” and ‘OLS, NETS” denote OLS estimators for simple linear regression that use wInfoUSA and wNETS, respectively. “OLS, average” denotes an OLS estimator for a regression using average of w1 and w2. For the proposed model, reference values (denoted as “Proposed”) and reference values × 1.25 (denoted as “Proposed×1.25”) were used for working credibility. Plotted are the naïve estimates with 95% confidence interval (black unfilled dots) and the posterior means with 95% credible intervals from our proposed model with reference values (red filled dot), and with 1.25 × reference values (black filled dot).

References

    1. Aldous DJ (1985). Exchangeability and related topics. In École d’Été de Probabilités de Saint-Flour XIII—1983 1–198. Springer.
    1. Athens JK, Duncan DT and Elbel B (2016). Proximity to fast-food outlets and supermarkets as predictors of fast-food dining frequency. Journal of the Academy of Nutrition and Dietetics 116 1266–1275. - PMC - PubMed
    1. Bartolucci F, Pandolfi S and Pennoni F (2022). Discrete Latent Variable Models. Annual Review of Statistics and Its Application 9 425–452.
    1. Caspi CE and Friebur R (2016). Modified ground-truthing: an accurate and cost-effective food environment validation method for town and rural areas. International Journal of Behavioral Nutrition and Physical Activity 13 1–8. - PMC - PubMed
    1. Dong XL, Berti-Equille L and Srivastava D (2009). Integrating conflicting data: the role of source dependence. Proceedings of the VLDB Endowment 2 550–561.

LinkOut - more resources