. 2024 Dec 17:11:1469878.

doi: 10.3389/fnut.2024.1469878. eCollection 2024.

Visual nutrition analysis: leveraging segmentation and regression for food nutrient estimation

Yaping Zhao^{1

2}, Ping Zhu^{2

3}, Yizhang Jiang¹, Kaijian Xia^{2

3}

Affiliations

¹ School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, Jiangsu, China.
² Changshu Key Laboratory of Medical Artificial Intelligence and Big Data, Suzhou, Jiangsu, China.
³ Department of Scientific Research, The Changshu Affiliated Hospital of Soochow University, Suzhou, Jiangsu, China.

PMID: 39742105
PMCID: PMC11685081
DOI: 10.3389/fnut.2024.1469878

Visual nutrition analysis: leveraging segmentation and regression for food nutrient estimation

Yaping Zhao et al. Front Nutr. 2024.

. 2024 Dec 17:11:1469878.

doi: 10.3389/fnut.2024.1469878. eCollection 2024.

Authors

Yaping Zhao^{1

2}, Ping Zhu^{2

3}, Yizhang Jiang¹, Kaijian Xia^{2

3}

Affiliations

¹ School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, Jiangsu, China.
² Changshu Key Laboratory of Medical Artificial Intelligence and Big Data, Suzhou, Jiangsu, China.
³ Department of Scientific Research, The Changshu Affiliated Hospital of Soochow University, Suzhou, Jiangsu, China.

PMID: 39742105
PMCID: PMC11685081
DOI: 10.3389/fnut.2024.1469878

Abstract

Introduction: Nutrition is closely related to body health. A reasonable diet structure not only meets the body's needs for various nutrients but also effectively prevents many chronic diseases. However, due to the general lack of systematic nutritional knowledge, people often find it difficult to accurately assess the nutritional content of food. In this context, image-based nutritional evaluation technology can provide significant assistance. Therefore, we are dedicated to directly predicting the nutritional content of dishes through images. Currently, most related research focuses on estimating the volume or area of food through image segmentation tasks and then calculating its nutritional content based on the food category. However, this method often lacks real nutritional content labels as a reference, making it difficult to ensure the accuracy of the predictions.

Methods: To address this issue, we combined segmentation and regression tasks and used the Nutrition5k dataset, which contains detailed nutritional content labels but no segmentation labels, for manual segmentation annotation. Based on these annotated data, we developed a nutritional content prediction model that performs segmentation first and regression afterward. Specifically, we first applied the UNet model to segment the food, then used a backbone network to extract features, and enhanced the feature expression capability through the Squeeze-and-Excitation structure. Finally, the extracted features were processed through several fully connected layers to obtain predictions for the weight, calories, fat, carbohydrates, and protein content.

Results and discussion: Our model achieved an outstanding average percentage mean absolute error (PMAE) of 17.06% for these components. All manually annotated segmentation labels can be found at https://doi.org/10.6084/m9.figshare.26252048.v1.

Keywords: Nutrition5k; deep learning; image segmentation; nutrition estimation; regression.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

**FIGURE 1**
Original food images and their segmentation labels. Columns 1 and 4 show the original images, columns 2 and 5 show images with segmentation labels overlaid on the original images, and columns 3 and 6 show the segmented food images (foreground images).

**FIGURE 2**
Some excluded data. The top left image is non-food, the top right image is misaligned, the bottom left image has overly uneven lighting, and the bottom right image shows overlapping dishes.

**FIGURE 4**
Histogram of the proportion of food regions in the 3,224 annotated images. The x-axis represents the ratio of the number of pixels occupied by the food to the total number of pixels in the image (%), and the y-axis represents the number of images corresponding to each ratio.

**FIGURE 5**
Label example. On the left is the image of dish_1559678127, and on the right are its corresponding ingredient labels.

**FIGURE 6**
Overall architecture of the model. The upper left shows the network for food segmentation, the lower left shows the regression network for predicting food nutritional content, and the right side details the SE module’s structure.

**FIGURE 7**
Comparison of visual results of food segmentation tasks using UNet, FCN, and DeepLabV3 models. The first column is the original image, the second column is the ground truth, the third column is the segmentation result of UNet, the fourth column is the segmentation result of FCN, and the fifth column is the segmentation result of DeepLabV3.

**FIGURE 8**
Comparison of the predicted values and actual values for five components between our method, ViT, and DenseNet.

**FIGURE 9**
**(a)** The food image corresponding to dish_1563984296; **(b)** The specific results of nutritional component predictions for this food image using the proposed method and comparison methods, including the mean absolute error (MAE) between the predicted and ground truth values for five nutritional components. Bolded values in the table indicate the predictions closest to the ground truth.

See this image and copyright information in PMC

Cited by

2D Prediction of the Nutritional Composition of Dishes from Food Images: Deep Learning Algorithm Selection and Data Curation Beyond the Nutrition5k Project.
Bianco R, Coluccia S, Marinoni M, Falcon A, Fiori F, Serra G, Ferraroni M, Edefonti V, Parpinel M. Bianco R, et al. Nutrients. 2025 Jun 30;17(13):2196. doi: 10.3390/nu17132196. Nutrients. 2025. PMID: 40647299 Free PMC article.

References

1. Mayne ST, Playdon MC, Rock CL. Diet, nutrition, and cancer: Past, present and future. Nat Rev Clin Oncol. (2016). 13:504–15. 10.1038/nrclinonc.2016.24 - DOI - PubMed
1. Dominguez LJ, Di Bella G, Veronese N, Barbagallo M. Impact of Mediterranean diet on chronic non-communicable diseases and longevity. Nutrients. (2021) 13:2028. 10.3390/nu13062028 - DOI - PMC - PubMed
1. Lee CD, Chae J, Schap TE, Kerr DA, Delp EJ, Ebert DS, et al. Comparison of known food weights with image-based portion-size automated estimation and adolescents’ self-reported portion size. J Diabetes Sci Technol. (2012) 6:428–34. 10.1177/193229681200600231 - DOI - PMC - PubMed
1. Jiang L, Qiu B, Liu X, Huang C, Lin K. Deepfood: Food image analysis and dietary assessment via deep model. IEEE Access. (2020) 8:47477–89. 10.1109/ACCESS.2020.2973625 - DOI
1. Situju SF, Takimoto H, Sato S, Yamauchi H, Kanagawa A, Lawi A. Food constituent estimation for lifestyle disease prevention by multi-task cnn. Appl Artif Intellig. (2019) 33:732–46. 10.1080/08839514.2019.1602318 - DOI

Associated data

figshare/10.6084/m9.figshare.26252048.v1

LinkOut - more resources

Full Text Sources
- Frontiers Media SA
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Visual nutrition analysis: leveraging segmentation and regression for food nutrient estimation

Affiliations

Visual nutrition analysis: leveraging segmentation and regression for food nutrient estimation

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Associated data

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Associated data

Related information

LinkOut - more resources

Full Text Sources