Tailoring the Nutritional Composition of Italian Foods to the US Nutrition5k Dataset for Food Image Recognition: Challenges and a Comparative Analysis
- PMID: 39408306
- PMCID: PMC11479105
- DOI: 10.3390/nu16193339
Tailoring the Nutritional Composition of Italian Foods to the US Nutrition5k Dataset for Food Image Recognition: Challenges and a Comparative Analysis
Abstract
Background: Training of machine learning algorithms on dish images collected in other countries requires possible sources of systematic discrepancies, including country-specific food composition databases (FCDBs), to be tackled. The US Nutrition5k project provides for ~5000 dish images and related dish- and ingredient-level information on mass, energy, and macronutrients from the US FCDB. The aim of this study is to (1) identify challenges/solutions in linking the nutritional composition of Italian foods with food images from Nutrition5k and (2) assess potential differences in nutrient content estimated across the Italian and US FCDBs and their determinants.
Methods: After food matching, expert data curation, and handling of missing values, dish-level ingredients from Nutrition5k were integrated with the Italian-FCDB-specific nutritional composition (86 components); dish-specific nutrient content was calculated by summing the corresponding ingredient-specific nutritional values. Measures of agreement/difference were calculated between Italian- and US-FCDB-specific content of energy and macronutrients. Potential determinants of identified differences were investigated with multiple robust regression models.
Results: Dishes showed a median mass of 145 g and included three ingredients in median. Energy, proteins, fats, and carbohydrates showed moderate-to-strong agreement between Italian- and US-FCDB-specific content; carbohydrates showed the worst performance, with the Italian FCDB providing smaller median values (median raw difference between the Italian and US FCDBs: -2.10 g). Regression models on dishes suggested a role for mass, number of ingredients, and presence of recreated recipes, alone or jointly with differential use of raw/cooked ingredients across the two FCDBs.
Conclusions: In the era of machine learning approaches for food image recognition, manual data curation in the alignment of FCDBs is worth the effort.
Keywords: database harmonization; dish images; food composition database; food matching; manual data curation; missing imputation; nutrition; nutritional composition of foods; “Nutrition5k” dataset.
Conflict of interest statement
The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.
Figures





Similar articles
-
2D Prediction of the Nutritional Composition of Dishes from Food Images: Deep Learning Algorithm Selection and Data Curation Beyond the Nutrition5k Project.Nutrients. 2025 Jun 30;17(13):2196. doi: 10.3390/nu17132196. Nutrients. 2025. PMID: 40647299 Free PMC article.
-
Intake of energy and nutrients; harmonization of Food Composition Databases.Nutr Hosp. 2015 Feb 26;31 Suppl 3:168-76. doi: 10.3305/nh.2015.31.sup3.8764. Nutr Hosp. 2015. PMID: 25719784 Review.
-
Food Composition Databases (FCDBs): A Bibliometric Analysis.Nutrients. 2023 Aug 11;15(16):3548. doi: 10.3390/nu15163548. Nutrients. 2023. PMID: 37630742 Free PMC article.
-
Development of an Unified Food Composition Database for the European Project "Stance4Health".Nutrients. 2021 Nov 24;13(12):4206. doi: 10.3390/nu13124206. Nutrients. 2021. PMID: 34959759 Free PMC article.
-
The Importance of Food Composition Data for Estimating Micronutrient Intake: What Do We Know Now and into the Future?Nestle Nutr Inst Workshop Ser. 2020;93:39-50. doi: 10.1159/000503355. Epub 2020 Jan 28. Nestle Nutr Inst Workshop Ser. 2020. PMID: 31991432 Review.
Cited by
-
2D Prediction of the Nutritional Composition of Dishes from Food Images: Deep Learning Algorithm Selection and Data Curation Beyond the Nutrition5k Project.Nutrients. 2025 Jun 30;17(13):2196. doi: 10.3390/nu17132196. Nutrients. 2025. PMID: 40647299 Free PMC article.
References
-
- Slimani N., Deharveng G., Unwin I., Southgate D.A.T., Vignat J., Skeie G., Salvini S., Parpinel M., Møller A., Ireland J., et al. The EPIC Nutrient Database Project (ENDB): A First Attempt to Standardize Nutrient Databases across the 10 European Countries Participating in the EPIC Study. Eur. J. Clin. Nutr. 2007;61:1037–1056. doi: 10.1038/sj.ejcn.1602679. - DOI - PubMed
-
- Castanheira I., André C., Oseredczuk M., Ireland J., Owen L., Robb P., Earnshaw A., Calhau M.A. Improving Data Quality in Food Composition Databanks: A EuroFIR Contribution. Accredit. Qual. Assur. 2007;12:117–125. doi: 10.1007/s00769-006-0225-6. - DOI
-
- Hinojosa-Nogueira D., Pérez-Burillo S., Navajas-Porras B., Ortiz-Viso B., de la Cueva S.P., Lauria F., Fatouros A., Priftis K.N., González-Vigil V., Rufián-Henares J.Á. Development of an Unified Food Composition Database for the European Project “Stance4health”. Nutrients. 2021;13:4206. doi: 10.3390/nu13124206. - DOI - PMC - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources