Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Dec 6:7:1476950.
doi: 10.3389/frai.2024.1476950. eCollection 2024.

What is in a food store name? Leveraging large language models to enhance food environment data

Affiliations

What is in a food store name? Leveraging large language models to enhance food environment data

Analee J Etheredge et al. Front Artif Intell. .

Abstract

Introduction: It is not uncommon to repurpose administrative food data to create food environment datasets in the health department and research settings; however, the available administrative data are rarely categorized in a way that supports meaningful insight or action, and ground-truthing or manually reviewing an entire city or neighborhood is rate-limiting to essential operations and analysis. We show that such categorizations should be viewed as a classification problem well addressed by recent advances in natural language processing and deep learning-with the advent of large language models (LLMs).

Methods: To demonstrate how to automate the process of categorizing food stores, we use the foundation model BERT to give a first approximation to such categorizations: a best guess by store name. First, 10 food retail classes were developed to comprehensively categorize food store types from a public health perspective.

Results: Based on this rubric, the model was tuned and evaluated (F1micro = 0.710, F1macro = 0.709) on an extensive storefront directory of New York City. Second, the model was applied to infer insights from a large, unlabeled dataset using store names alone, aiming to replicate known temporospatial patterns. Finally, a complimentary application of the model as a data quality enhancement tool was demonstrated on a secondary, pre-labeled restaurant dataset.

Discussion: This novel application of an LLM to the enumeration of the food environment allowed for marked gains in efficiency compared to manual, in-person methods, addressing a known challenge to research and operations in a local health department.

Keywords: administrative food data; deep learning; food environment classification; food store name; health department; large language models; machine learning; natural language processing.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
Sensitivity and recall for the hold-out test set. Dashed vertical lines show adopted sensitivity cutoffs for food environment classification were applied to our sensitivity analyses: <20% very poor, 21–30% poor, 31–50% fair, 51%–71 moderate, 71–90% good, and > 90% excellent (Paquet et al., 2008; Bishop et al., 2021).
Figure 2
Figure 2
NYC maps for mean annual change in Grocery and Specialty Stores and Convenience Stores, 2019-2021. Hexbin size: 2640 ft. with overlay border 2010 NTAs (A) NYC Grocery and Specialty Stores: Mean businesses per hexbin, 3.7 (SD=0.55) (B) NYC Convenience Stores: Mean businesses per hexbin, 7.4 (SD=0.82) (C) Chinatown and Sunset Park Feature - Grocery and Specialty Stores: Mean businesses per hexbin, 3.7 (SD=0.55) (D) Chinatown and Sunset Park Feature - Convenience Stores: Mean businesses per hexbin, 7.4 (SD=0.82).
Figure 3
Figure 3
Heatmap of service description tag frequency in the restaurants dataset as compared to the fast food and restaurant classifier labels. Counts are overlayed for clarity.
Figure 4
Figure 4
Frequencies of venue tags in the restaurant dataset: the 11 most frequent of the 41 tags belonging to this variable.

Similar articles

References

    1. Agurs-Collins T., Alvidrez J., ElShourbagy Ferreira S., Evans M., Gibbs K., Kowtha B., et al. . (2024). Perspective: nutrition health disparities framework: a model to advance health equity. Adv. Nutr. 15:100194. doi: 10.1016/j.advnut.2024.100194 - DOI - PMC - PubMed
    1. Bishop T. R. P., von Hinke S., Hollingsworth B., Lake A. A., Brown H., Burgoine T. (2021). Automatic classification of takeaway food outlet cuisine type using machine (deep) learning. Mach Learn Appl 6:100106. doi: 10.1016/j.mlwa.2021.100106, PMID: - DOI - PMC - PubMed
    1. Block J. P., Subramanian S. (2015). Moving beyond “food deserts”: reorienting United States policies to reduce disparities in diet quality. PLoS Med. 12:e1001914. doi: 10.1371/journal.pmed.1001914, PMID: - DOI - PMC - PubMed
    1. Boise S., Crossa A., Etheredge A. J., McCulley E. M., Lovasi G. S. (2023). Concepts, characterizations, and cautions: A public health guide and glossary for planning food environment measurement. Open Public Health J 16, 1–17. doi: 10.2174/18749445-v16-230821-2023-51 - DOI - PMC - PubMed
    1. Braid L., Oliva R., Nichols K., Reyes A., Guzman J., Goldman R. E., et al. . (2022). Community perceptions in new York City: sugar-sweetened beverage policies and programs in the first 1000 days. Matern. Child Health J. 26, 193–204. doi: 10.1007/s10995-021-03255-8, PMID: - DOI - PMC - PubMed

LinkOut - more resources