Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Aug 9;23(8):e28229.
doi: 10.2196/28229.

A Fine-Tuned Bidirectional Encoder Representations From Transformers Model for Food Named-Entity Recognition: Algorithm Development and Validation

Affiliations

A Fine-Tuned Bidirectional Encoder Representations From Transformers Model for Food Named-Entity Recognition: Algorithm Development and Validation

Riste Stojanov et al. J Med Internet Res. .

Abstract

Background: Recently, food science has been garnering a lot of attention. There are many open research questions on food interactions, as one of the main environmental factors, with other health-related entities such as diseases, treatments, and drugs. In the last 2 decades, a large amount of work has been done in natural language processing and machine learning to enable biomedical information extraction. However, machine learning in food science domains remains inadequately resourced, which brings to attention the problem of developing methods for food information extraction. There are only few food semantic resources and few rule-based methods for food information extraction, which often depend on some external resources. However, an annotated corpus with food entities along with their normalization was published in 2019 by using several food semantic resources.

Objective: In this study, we investigated how the recently published bidirectional encoder representations from transformers (BERT) model, which provides state-of-the-art results in information extraction, can be fine-tuned for food information extraction.

Methods: We introduce FoodNER, which is a collection of corpus-based food named-entity recognition methods. It consists of 15 different models obtained by fine-tuning 3 pretrained BERT models on 5 groups of semantic resources: food versus nonfood entity, 2 subsets of Hansard food semantic tags, FoodOn semantic tags, and Systematized Nomenclature of Medicine Clinical Terms food semantic tags.

Results: All BERT models provided very promising results with 93.30% to 94.31% macro F1 scores in the task of distinguishing food versus nonfood entity, which represents the new state-of-the-art technology in food information extraction. Considering the tasks where semantic tags are predicted, all BERT models obtained very promising results once again, with their macro F1 scores ranging from 73.39% to 78.96%.

Conclusions: FoodNER can be used to extract and annotate food entities in 5 different tasks: food versus nonfood entities and distinguishing food entities on the level of food groups by using the closest Hansard semantic tags, the parent Hansard semantic tags, the FoodOn semantic tags, or the Systematized Nomenclature of Medicine Clinical Terms semantic tags.

Keywords: BERT; bidirectional encoder representations from transformers; fine-tuning BERT; food information extraction; information extraction; machine learning; named-entity recognition; natural language processing; semantic annotation.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

Figure 1
Figure 1
Recipe example.
Figure 2
Figure 2
Food named-entity recognition flowchart. BERT: bidirectional encoder representations from transformers; NER: named-entity recognition; SNOMED CT: Systematized Nomenclature of Medicine Clinical Terms.
Figure 3
Figure 3
An example of food entities available from one recipe that are present in the training data set. The entities are annotated using Hansard parent, Hansard closest, FoodOn, Systematized Nomenclature of Medicine Clinical Terms, and OntoFood (not studied in this paper) semantic tags.
Figure 4
Figure 4
Training and validation loss per fine-tuning epoch for the bio bidirectional encoder representations from transformers large model on the Hansard parent data set.
Figure 5
Figure 5
Macro F1 scores for all considered models for the food versus nonfood entity task. Each macro F1 score is obtained by using stratified k-fold cross-validation (k=5). Underlined values are best per subtable, while the bold value is the best from the whole table. BERT: bidirectional encoder representations from transformers; BiLSTM-CRF: bidirectional long short-term memory conditional random field; BuTTER: bidirectional long short-term memory for food named-entity recognition; NER: named-entity recognition.
Figure 6
Figure 6
Boxplots of macro F1 scores obtained by using stratified five-fold cross-validation for all considered models for the binary food classification task. BERT: bidirectional encoder representations from transformers; BiLSTM-CRF: bidirectional long short-term memory conditional random field.
Figure 7
Figure 7
Food named-entity recognition integration in FoodViz.

Similar articles

Cited by

References

    1. Johan F, Owen G. Scaling 36 solutions to halve emissions by 2030. Exponential Roadmap. 2020. [2021-05-19]. https://exponentialroadmap.org/wp-content/uploads/2019/09/Exponential-Ro... .
    1. Qiao L, Yang L, Hong D, Yao L, Zhiguang Q. Knowledge graph construction techniques. Journal of computer research and development. 2016;53(3):582. doi: 10.7544/issn1000-1239.2016.20148228. https://crad.ict.ac.cn/EN/10.7544/issn1000-1239.2016.20148228 - DOI - DOI
    1. Zhou X, Zhang X, Hu X. MaxMatcher: Biological concept extraction using approximate dictionary lookup. Pacific Rim International Conference On Artificial Intelligence; August 7-11, 2006; Guilin, China. 2006. pp. 1145–1149. - DOI
    1. Hanisch D, Fundel K, Mevissen H, Zimmer R, Fluck J. ProMiner: rule-based protein and gene entity recognition. BMC Bioinformatics. 2005;6(Suppl 1):S14. doi: 10.1186/1471-2105-6-s1-s14. - DOI - PMC - PubMed
    1. Eftimov T, Koroušić Seljak Barbara, Korošec Peter. A rule-based named-entity recognition method for knowledge extraction of evidence-based dietary recommendations. PLoS One. 2017;12(6):e0179488. doi: 10.1371/journal.pone.0179488. https://dx.plos.org/10.1371/journal.pone.0179488 PONE-D-16-46189 - DOI - DOI - PMC - PubMed

Publication types

LinkOut - more resources