Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Apr 5;2(4):e192884.
doi: 10.1001/jamanetworkopen.2019.2884.

Identification of Factors Associated With Variation in US County-Level Obesity Prevalence Rates Using Epidemiologic vs Machine Learning Models

Affiliations

Identification of Factors Associated With Variation in US County-Level Obesity Prevalence Rates Using Epidemiologic vs Machine Learning Models

David Scheinker et al. JAMA Netw Open. .

Abstract

Importance: Obesity is a leading cause of high health care expenditures, disability, and premature mortality. Previous studies have documented geographic disparities in obesity prevalence.

Objective: To identify county-level factors associated with obesity using traditional epidemiologic and machine learning methods.

Design, setting, and participants: Cross-sectional study using linear regression models and machine learning models to evaluate the associations between county-level obesity and county-level demographic, socioeconomic, health care, and environmental factors from summarized statistical data extracted from the 2018 Robert Wood Johnson Foundation County Health Rankings and merged with US Census data from each of 3138 US counties. The explanatory power of the linear multivariate regression and the top performing machine learning model were compared using mean R2 measured in 30-fold cross validation.

Exposures: County-level demographic factors (population; rural status; census region; and race/ethnicity, sex, and age composition), socioeconomic factors (median income, unemployment rate, and percentage of population with some college education), health care factors (rate of uninsured adults and primary care physicians), and environmental factors (access to healthy foods and access to exercise opportunities).

Main outcomes and measures: County-level obesity prevalence in 2018, its association with each county-level factor, and the percentage of variation in county-level obesity prevalence explained by linear multivariate and gradient boosting machine regression measured with R2.

Results: Among the 3138 counties studied, the mean (range) obesity prevalence was 31.5% (12.8%-47.8%). In multivariate regressions, demographic factors explained 44.9% of variation in obesity prevalence; socioeconomic factors, 33.0%; environmental factors, 15.5%; and health care factors, 9.1%. The county-level factors with the strongest association with obesity were census region, median household income, and percentage of population with some college education. R2 values of univariate regressions of obesity prevalence were 0.238 for census region, 0.218 for median household income, and 0.160 for percentage of population with some college education. Multivariate linear regression and gradient boosting machine regression (the best-performing machine learning model) of obesity prevalence using all county-level demographic, socioeconomic, health care, and environmental factors had R2 values of 0.58 and 0.66, respectively (P < .001).

Conclusions and relevance: Obesity prevalence varies significantly between counties. County-level demographic, socioeconomic, health care, and environmental factors explain the majority of variation in county-level obesity prevalence. Using machine learning models may explain significantly more of the variation in obesity prevalence..

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest Disclosures: Dr Scheinker reported being an advisor to Carta Healthcare with equity. Dr Rodriguez reported receiving compensation from Novo Nordisk for event adjudication and stock from HealthPals outside the submitted work. No other disclosures were reported.

Figures

Figure 1.
Figure 1.. Distribution of Obesity Prevalence by County and Census Region
A, Map of US counties by obesity prevalence. B, Density plot of county-level obesity prevalence in each US Census region.
Figure 2.
Figure 2.. Comparison of Performance of Gradient Boosting Machine Regression and Linear Multivariate Regression Using 30-Fold Cross Validation
Violin plots of the distribution of the R2 values of the gradient boosting machine and linear model models. The box plots inside the violin plot show the following values of the distribution of R2 for the gradient boosting machine and linear models: the middle lines indicate the medians, the bottom and top of each box show the 25th and 75th percentiles, respectively, the bottom whiskers show the values of the 25th percentile minus 1.5 × the interquartile range, the top whiskers show the values of the 75th percentile plus 1.5 × the interquartile range, and the top and bottom points are all outliers, defined as points in the data that lie below and above the whiskers.

References

    1. Visscher TLS, Seidell JC. The public health impact of obesity. Annu Rev Public Health. 2001;22(1):-. doi:10.1146/annurev.publhealth.22.1.355 - DOI - PubMed
    1. Stokes A, Preston SH. Deaths attributable to diabetes in the United States: comparison of data sources and estimation approaches. PLoS One. 2017;12(1):e0170219. doi:10.1371/journal.pone.0170219 - DOI - PMC - PubMed
    1. Dwyer-Lindgren L, Freedman G, Engell RE, et al. . Prevalence of physical activity and obesity in US counties, 2001-2011: a road map for action. Popul Health Metr. 2013;11(1):7. doi:10.1186/1478-7954-11-7 - DOI - PMC - PubMed
    1. Myers CA, Slack T, Martin CK, Broyles ST, Heymsfield SB. Regional disparities in obesity prevalence in the United States: a spatial regime analysis. Obesity (Silver Spring). 2015;23(2):481-487. doi:10.1002/oby.20963 - DOI - PMC - PubMed
    1. von Hippel P, Benson R. Obesity and the natural environment across US counties. Am J Public Health. 2014;104(7):1287-1293. doi:10.2105/AJPH.2013.301838 - DOI - PMC - PubMed

Publication types