Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Dec 17:11:1469878.
doi: 10.3389/fnut.2024.1469878. eCollection 2024.

Visual nutrition analysis: leveraging segmentation and regression for food nutrient estimation

Affiliations

Visual nutrition analysis: leveraging segmentation and regression for food nutrient estimation

Yaping Zhao et al. Front Nutr. .

Abstract

Introduction: Nutrition is closely related to body health. A reasonable diet structure not only meets the body's needs for various nutrients but also effectively prevents many chronic diseases. However, due to the general lack of systematic nutritional knowledge, people often find it difficult to accurately assess the nutritional content of food. In this context, image-based nutritional evaluation technology can provide significant assistance. Therefore, we are dedicated to directly predicting the nutritional content of dishes through images. Currently, most related research focuses on estimating the volume or area of food through image segmentation tasks and then calculating its nutritional content based on the food category. However, this method often lacks real nutritional content labels as a reference, making it difficult to ensure the accuracy of the predictions.

Methods: To address this issue, we combined segmentation and regression tasks and used the Nutrition5k dataset, which contains detailed nutritional content labels but no segmentation labels, for manual segmentation annotation. Based on these annotated data, we developed a nutritional content prediction model that performs segmentation first and regression afterward. Specifically, we first applied the UNet model to segment the food, then used a backbone network to extract features, and enhanced the feature expression capability through the Squeeze-and-Excitation structure. Finally, the extracted features were processed through several fully connected layers to obtain predictions for the weight, calories, fat, carbohydrates, and protein content.

Results and discussion: Our model achieved an outstanding average percentage mean absolute error (PMAE) of 17.06% for these components. All manually annotated segmentation labels can be found at https://doi.org/10.6084/m9.figshare.26252048.v1.

Keywords: Nutrition5k; deep learning; image segmentation; nutrition estimation; regression.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
Original food images and their segmentation labels. Columns 1 and 4 show the original images, columns 2 and 5 show images with segmentation labels overlaid on the original images, and columns 3 and 6 show the segmented food images (foreground images).
FIGURE 2
FIGURE 2
Some excluded data. The top left image is non-food, the top right image is misaligned, the bottom left image has overly uneven lighting, and the bottom right image shows overlapping dishes.
FIGURE 3
FIGURE 3
Data screening process.
FIGURE 4
FIGURE 4
Histogram of the proportion of food regions in the 3,224 annotated images. The x-axis represents the ratio of the number of pixels occupied by the food to the total number of pixels in the image (%), and the y-axis represents the number of images corresponding to each ratio.
FIGURE 5
FIGURE 5
Label example. On the left is the image of dish_1559678127, and on the right are its corresponding ingredient labels.
FIGURE 6
FIGURE 6
Overall architecture of the model. The upper left shows the network for food segmentation, the lower left shows the regression network for predicting food nutritional content, and the right side details the SE module’s structure.
FIGURE 7
FIGURE 7
Comparison of visual results of food segmentation tasks using UNet, FCN, and DeepLabV3 models. The first column is the original image, the second column is the ground truth, the third column is the segmentation result of UNet, the fourth column is the segmentation result of FCN, and the fifth column is the segmentation result of DeepLabV3.
FIGURE 8
FIGURE 8
Comparison of the predicted values and actual values for five components between our method, ViT, and DenseNet.
FIGURE 9
FIGURE 9
(a) The food image corresponding to dish_1563984296; (b) The specific results of nutritional component predictions for this food image using the proposed method and comparison methods, including the mean absolute error (MAE) between the predicted and ground truth values for five nutritional components. Bolded values in the table indicate the predictions closest to the ground truth.

Similar articles

Cited by

References

    1. Mayne ST, Playdon MC, Rock CL. Diet, nutrition, and cancer: Past, present and future. Nat Rev Clin Oncol. (2016). 13:504–15. 10.1038/nrclinonc.2016.24 - DOI - PubMed
    1. Dominguez LJ, Di Bella G, Veronese N, Barbagallo M. Impact of Mediterranean diet on chronic non-communicable diseases and longevity. Nutrients. (2021) 13:2028. 10.3390/nu13062028 - DOI - PMC - PubMed
    1. Lee CD, Chae J, Schap TE, Kerr DA, Delp EJ, Ebert DS, et al. Comparison of known food weights with image-based portion-size automated estimation and adolescents’ self-reported portion size. J Diabetes Sci Technol. (2012) 6:428–34. 10.1177/193229681200600231 - DOI - PMC - PubMed
    1. Jiang L, Qiu B, Liu X, Huang C, Lin K. Deepfood: Food image analysis and dietary assessment via deep model. IEEE Access. (2020) 8:47477–89. 10.1109/ACCESS.2020.2973625 - DOI
    1. Situju SF, Takimoto H, Sato S, Yamauchi H, Kanagawa A, Lawi A. Food constituent estimation for lifestyle disease prevention by multi-task cnn. Appl Artif Intellig. (2019) 33:732–46. 10.1080/08839514.2019.1602318 - DOI

LinkOut - more resources