Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Nov 28;12(23):4293.
doi: 10.3390/foods12234293.

DPF-Nutrition: Food Nutrition Estimation via Depth Prediction and Fusion

Affiliations

DPF-Nutrition: Food Nutrition Estimation via Depth Prediction and Fusion

Yuzhe Han et al. Foods. .

Abstract

A reasonable and balanced diet is essential for maintaining good health. With advancements in deep learning, an automated nutrition estimation method based on food images offers a promising solution for monitoring daily nutritional intake and promoting dietary health. While monocular image-based nutrition estimation is convenient, efficient and economical, the challenge of limited accuracy remains a significant concern. To tackle this issue, we proposed DPF-Nutrition, an end-to-end nutrition estimation method using monocular images. In DPF-Nutrition, we introduced a depth prediction module to generate depth maps, thereby improving the accuracy of food portion estimation. Additionally, we designed an RGB-D fusion module that combined monocular images with the predicted depth information, resulting in better performance for nutrition estimation. To the best of our knowledge, this was the pioneering effort that integrated depth prediction and RGB-D fusion techniques in food nutrition estimation. Comprehensive experiments performed on Nutrition5k evaluated the effectiveness and efficiency of DPF-Nutrition.

Keywords: RGB-D fusion; deep learning; depth prediction; nutrition estimation.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
The example images from Nutrition5k dataset. (a) RGB images. (b) Depth maps. (c) Nutritional annotations.
Figure 2
Figure 2
Incorrect image samples. (a) Food is not fully incorporated in the image. (b) Dishes are overlapping. (c) Non-food image.
Figure 3
Figure 3
The overall framework of our DPF-Nutrition, which consists of depth prediction module and RGB-D fusion module. We adopt depth prediction transformer (DPT) to generate the predicted depth map. We design a cross-modal attention block (CAB) to extract and integrate the complementary features of RGB and depth images. ⨁ indicates element-wise addition, Ⓖ denotes global average pool.
Figure 4
Figure 4
(a) The structure of depth prediction module. The input image is transformed into feature vectors by ResNet-50 feature extractor and consequently embedded into two-dimensional tokens. The tokens are then fed into transformer encoder. The tokens from different transformer stages are reassembled into image-like feature maps at various resolutions. Finally, the image-like feature maps are fused progressively to generate the depth prediction. (b) The structure of transformer encoder. ⨁ indicates element-wise addition.
Figure 5
Figure 5
The structures of the RGB-D fusion paradigms. (a) Fusion–enhancement. (b) Enhancement–fusion. (c) Our proposed. ⨁ denotes element-wise addition, ⨂ indicates pixel-wise multiplication, Ⓒ represents cross-channel concatenation.
Figure 6
Figure 6
The structure of CAB. GAP indicates global average pooling, ⨁ denotes element-wise addition, ⨂ indicates pixel-wise multiplication, Ⓒ represents cross-channel concatenation, Mean represents mean function along the channel dimension.
Figure 7
Figure 7
The sample results of the depth estimation. (a) RGB images. (b) Estimated depth maps. (c) Actual depth maps.
Figure 8
Figure 8
The visualization results. (a) The ROI heat-maps of different nutrients. (b) The nutrition facts.

Similar articles

Cited by

References

    1. Greenhalgh S. Soda industry influence on obesity science and policy in China. J. Public Health Policy. 2019;40:5–16. doi: 10.1057/s41271-018-00158-x. - DOI - PubMed
    1. Matthews J. 2011 Food & Health Survey Consumer Attitudes toward Food Safety, Nutrition & Health. Volume 31 International Food Information Council Foundation; Washington, DC, USA: 2011.
    1. Subar A.F., Kirkpatrick S.I., Mittl B., Zimmerman T.P., Thompson F.E., Bingley C., Willis G., Islam N.G., Baranowski T., McNutt S., et al. The automated self-administered 24-hour dietary recall (ASA24): A resource for researchers, clinicians and educators from the National Cancer Institute. J. Acad. Nutr. Diet. 2012;112:1134. doi: 10.1016/j.jand.2012.04.016. - DOI - PMC - PubMed
    1. Meyers A., Johnston N., Rathod V., Korattikara A., Gorban A., Silberman N., Guadarrama S., Papandreou G., Huang J., Murphy K.P. Im2Calories: Towards an automated mobile vision food diary; Proceedings of the IEEE International Conference on Computer Vision; Santiago, Chile. 7–13 December 2015; pp. 1233–1241. - DOI
    1. Ege T., Yanai K. Image-based food calorie estimation using knowledge on food categories, ingredients and cooking directions; Proceedings of the on Thematic Workshops of ACM Multimedia 2017; Mountain View, CA, USA. 23–27 October 2017; pp. 367–375. - DOI

LinkOut - more resources