Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2024 Oct 1;16(19):3339.
doi: 10.3390/nu16193339.

Tailoring the Nutritional Composition of Italian Foods to the US Nutrition5k Dataset for Food Image Recognition: Challenges and a Comparative Analysis

Affiliations
Comparative Study

Tailoring the Nutritional Composition of Italian Foods to the US Nutrition5k Dataset for Food Image Recognition: Challenges and a Comparative Analysis

Rachele Bianco et al. Nutrients. .

Abstract

Background: Training of machine learning algorithms on dish images collected in other countries requires possible sources of systematic discrepancies, including country-specific food composition databases (FCDBs), to be tackled. The US Nutrition5k project provides for ~5000 dish images and related dish- and ingredient-level information on mass, energy, and macronutrients from the US FCDB. The aim of this study is to (1) identify challenges/solutions in linking the nutritional composition of Italian foods with food images from Nutrition5k and (2) assess potential differences in nutrient content estimated across the Italian and US FCDBs and their determinants.

Methods: After food matching, expert data curation, and handling of missing values, dish-level ingredients from Nutrition5k were integrated with the Italian-FCDB-specific nutritional composition (86 components); dish-specific nutrient content was calculated by summing the corresponding ingredient-specific nutritional values. Measures of agreement/difference were calculated between Italian- and US-FCDB-specific content of energy and macronutrients. Potential determinants of identified differences were investigated with multiple robust regression models.

Results: Dishes showed a median mass of 145 g and included three ingredients in median. Energy, proteins, fats, and carbohydrates showed moderate-to-strong agreement between Italian- and US-FCDB-specific content; carbohydrates showed the worst performance, with the Italian FCDB providing smaller median values (median raw difference between the Italian and US FCDBs: -2.10 g). Regression models on dishes suggested a role for mass, number of ingredients, and presence of recreated recipes, alone or jointly with differential use of raw/cooked ingredients across the two FCDBs.

Conclusions: In the era of machine learning approaches for food image recognition, manual data curation in the alignment of FCDBs is worth the effort.

Keywords: database harmonization; dish images; food composition database; food matching; manual data curation; missing imputation; nutrition; nutritional composition of foods; “Nutrition5k” dataset.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Figures

Figure 1
Figure 1
Comprehensive work plan related to research data acquisition and processing.
Figure 2
Figure 2
Indirect matching: imputation strategies and related frequencies.
Figure 3
Figure 3
Top 30 ingredients by frequency of use in Nutrition5k after data curation.
Figure 4
Figure 4
Top 30 ingredients by total mass (kg) across dishes in Nutrition5k after data curation.
Figure 5
Figure 5
Bland–Altman plots representing the raw absolute difference between Italian- and US-specific content (x-axis) versus the mean of the Italian- and US-specific content for each nutrient (y-axis), with corresponding 95% limits of agreement (green line for the mean difference and corresponding red lines for the limits of agreement). The dotted red line indicates the reference value of 0.

Similar articles

Cited by

References

    1. Merchant A.T., Dehghan M. Food Composition Database Development for between Country Comparisons. Nutr. J. 2006;5:2. doi: 10.1186/1475-2891-5-2. - DOI - PMC - PubMed
    1. Slimani N., Deharveng G., Unwin I., Southgate D.A.T., Vignat J., Skeie G., Salvini S., Parpinel M., Møller A., Ireland J., et al. The EPIC Nutrient Database Project (ENDB): A First Attempt to Standardize Nutrient Databases across the 10 European Countries Participating in the EPIC Study. Eur. J. Clin. Nutr. 2007;61:1037–1056. doi: 10.1038/sj.ejcn.1602679. - DOI - PubMed
    1. Egan M.B., Fragodt A., Raats M.M. The Importance of Harmonising and Sustaining Food Composition Data across Europe. Nutr. Bull. 2006;31:349–353. doi: 10.1111/j.1467-3010.2006.00590.x. - DOI - PubMed
    1. Castanheira I., André C., Oseredczuk M., Ireland J., Owen L., Robb P., Earnshaw A., Calhau M.A. Improving Data Quality in Food Composition Databanks: A EuroFIR Contribution. Accredit. Qual. Assur. 2007;12:117–125. doi: 10.1007/s00769-006-0225-6. - DOI
    1. Hinojosa-Nogueira D., Pérez-Burillo S., Navajas-Porras B., Ortiz-Viso B., de la Cueva S.P., Lauria F., Fatouros A., Priftis K.N., González-Vigil V., Rufián-Henares J.Á. Development of an Unified Food Composition Database for the European Project “Stance4health”. Nutrients. 2021;13:4206. doi: 10.3390/nu13124206. - DOI - PMC - PubMed

Publication types

LinkOut - more resources