Comparative Study

. 2024 Oct 1;16(19):3339.

doi: 10.3390/nu16193339.

Tailoring the Nutritional Composition of Italian Foods to the US Nutrition5k Dataset for Food Image Recognition: Challenges and a Comparative Analysis

Rachele Bianco¹, Michela Marinoni², Sergio Coluccia², Giulia Carioni^{1

3}, Federica Fiori¹, Patrizia Gnagnarella³, Valeria Edefonti^{2

4}, Maria Parpinel¹

Affiliations

¹ Department of Medicine-DMED, Università degli Studi di Udine, 33100 Udine, Italy.
² Branch of Medical Statistics, Biometry and Epidemiology "G. A. Maccacaro", Department of Clinical Sciences and Community Health, Dipartimento di Eccellenza 2023-2027, Università degli Studi di Milano, 20133 Milan, Italy.
³ Division of Epidemiology and Biostatistics, European Institute of Oncology, IRCCS, 20141 Milan, Italy.
⁴ Fondazione IRCCS Ca' Granda Ospedale Maggiore Policlinico, 20122 Milan, Italy.

PMID: 39408306
PMCID: PMC11479105
DOI: 10.3390/nu16193339

Comparative Study

Tailoring the Nutritional Composition of Italian Foods to the US Nutrition5k Dataset for Food Image Recognition: Challenges and a Comparative Analysis

Rachele Bianco et al. Nutrients. 2024.

. 2024 Oct 1;16(19):3339.

doi: 10.3390/nu16193339.

Authors

Rachele Bianco¹, Michela Marinoni², Sergio Coluccia², Giulia Carioni^{1

3}, Federica Fiori¹, Patrizia Gnagnarella³, Valeria Edefonti^{2

4}, Maria Parpinel¹

Affiliations

¹ Department of Medicine-DMED, Università degli Studi di Udine, 33100 Udine, Italy.
² Branch of Medical Statistics, Biometry and Epidemiology "G. A. Maccacaro", Department of Clinical Sciences and Community Health, Dipartimento di Eccellenza 2023-2027, Università degli Studi di Milano, 20133 Milan, Italy.
³ Division of Epidemiology and Biostatistics, European Institute of Oncology, IRCCS, 20141 Milan, Italy.
⁴ Fondazione IRCCS Ca' Granda Ospedale Maggiore Policlinico, 20122 Milan, Italy.

PMID: 39408306
PMCID: PMC11479105
DOI: 10.3390/nu16193339

Abstract

Background: Training of machine learning algorithms on dish images collected in other countries requires possible sources of systematic discrepancies, including country-specific food composition databases (FCDBs), to be tackled. The US Nutrition5k project provides for ~5000 dish images and related dish- and ingredient-level information on mass, energy, and macronutrients from the US FCDB. The aim of this study is to (1) identify challenges/solutions in linking the nutritional composition of Italian foods with food images from Nutrition5k and (2) assess potential differences in nutrient content estimated across the Italian and US FCDBs and their determinants.

Methods: After food matching, expert data curation, and handling of missing values, dish-level ingredients from Nutrition5k were integrated with the Italian-FCDB-specific nutritional composition (86 components); dish-specific nutrient content was calculated by summing the corresponding ingredient-specific nutritional values. Measures of agreement/difference were calculated between Italian- and US-FCDB-specific content of energy and macronutrients. Potential determinants of identified differences were investigated with multiple robust regression models.

Results: Dishes showed a median mass of 145 g and included three ingredients in median. Energy, proteins, fats, and carbohydrates showed moderate-to-strong agreement between Italian- and US-FCDB-specific content; carbohydrates showed the worst performance, with the Italian FCDB providing smaller median values (median raw difference between the Italian and US FCDBs: -2.10 g). Regression models on dishes suggested a role for mass, number of ingredients, and presence of recreated recipes, alone or jointly with differential use of raw/cooked ingredients across the two FCDBs.

Conclusions: In the era of machine learning approaches for food image recognition, manual data curation in the alignment of FCDBs is worth the effort.

Keywords: database harmonization; dish images; food composition database; food matching; manual data curation; missing imputation; nutrition; nutritional composition of foods; “Nutrition5k” dataset.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Figures

**Figure 1**
Comprehensive work plan related to research data acquisition and processing.

**Figure 2**
Indirect matching: imputation strategies and related frequencies.

**Figure 3**
Top 30 ingredients by frequency of use in Nutrition5k after data curation.

**Figure 4**
Top 30 ingredients by total mass (kg) across dishes in Nutrition5k after data curation.

**Figure 5**
Bland–Altman plots representing the raw absolute difference between Italian- and US-specific content (x-axis) versus the mean of the Italian- and US-specific content for each nutrient (y-axis), with corresponding 95% limits of agreement (green line for the mean difference and corresponding red lines for the limits of agreement). The dotted red line indicates the reference value of 0.

See this image and copyright information in PMC

Cited by

2D Prediction of the Nutritional Composition of Dishes from Food Images: Deep Learning Algorithm Selection and Data Curation Beyond the Nutrition5k Project.
Bianco R, Coluccia S, Marinoni M, Falcon A, Fiori F, Serra G, Ferraroni M, Edefonti V, Parpinel M. Bianco R, et al. Nutrients. 2025 Jun 30;17(13):2196. doi: 10.3390/nu17132196. Nutrients. 2025. PMID: 40647299 Free PMC article.

References

1. Merchant A.T., Dehghan M. Food Composition Database Development for between Country Comparisons. Nutr. J. 2006;5:2. doi: 10.1186/1475-2891-5-2. - DOI - PMC - PubMed
1. Slimani N., Deharveng G., Unwin I., Southgate D.A.T., Vignat J., Skeie G., Salvini S., Parpinel M., Møller A., Ireland J., et al. The EPIC Nutrient Database Project (ENDB): A First Attempt to Standardize Nutrient Databases across the 10 European Countries Participating in the EPIC Study. Eur. J. Clin. Nutr. 2007;61:1037–1056. doi: 10.1038/sj.ejcn.1602679. - DOI - PubMed
1. Egan M.B., Fragodt A., Raats M.M. The Importance of Harmonising and Sustaining Food Composition Data across Europe. Nutr. Bull. 2006;31:349–353. doi: 10.1111/j.1467-3010.2006.00590.x. - DOI - PubMed
1. Castanheira I., André C., Oseredczuk M., Ireland J., Owen L., Robb P., Earnshaw A., Calhau M.A. Improving Data Quality in Food Composition Databanks: A EuroFIR Contribution. Accredit. Qual. Assur. 2007;12:117–125. doi: 10.1007/s00769-006-0225-6. - DOI
1. Hinojosa-Nogueira D., Pérez-Burillo S., Navajas-Porras B., Ortiz-Viso B., de la Cueva S.P., Lauria F., Fatouros A., Priftis K.N., González-Vigil V., Rufián-Henares J.Á. Development of an Unified Food Composition Database for the European Project “Stance4health”. Nutrients. 2021;13:4206. doi: 10.3390/nu13124206. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

PRIN 20227YCB5P/Ministero dell'Istruzione e del Merito

LinkOut - more resources

Full Text Sources
- MDPI
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Tailoring the Nutritional Composition of Italian Foods to the US Nutrition5k Dataset for Food Image Recognition: Challenges and a Comparative Analysis

Affiliations

Tailoring the Nutritional Composition of Italian Foods to the US Nutrition5k Dataset for Food Image Recognition: Challenges and a Comparative Analysis

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources