Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Nov 25:77:120-129.
doi: 10.1016/j.nbt.2023.08.004. Epub 2023 Aug 29.

Machine translation of standardised medical terminology using natural language processing: A scoping review

Affiliations
Free article

Machine translation of standardised medical terminology using natural language processing: A scoping review

Richard Noll et al. N Biotechnol. .
Free article

Abstract

Standardised medical terminologies are used to ensure accurate and consistent communication of information and to facilitate data exchange. Currently, many terminologies are only available in English, which hinders international research and automated processing of medical data. Natural language processing (NLP) and Machine Translation (MT) methods can be used to automatically translate these terms. This scoping review examines the research on automated translation of standardised medical terminology. A search was performed in PubMed and Web of Science and results were screened for eligibility by title and abstract as well as full text screening. In addition to bibliographic data, the following data items were considered: 'terminology considered', 'terms considered', 'source language', 'target language', 'translation type', 'NLP technique', 'NLP system', 'machine translation system', 'data source' and 'translation quality'. The results showed that the most frequently translated terminology is SNOMED CT (39.1%), followed by MeSH (13%), ICD (13%) and UMLS (8.7%). The most common source language is English (55.9%), and the most common target language is German (41.2%). Translation methods are often based on Statistical Machine Translation (SMT) (41.7%) and, more recently, Neural Machine Translation (NMT) (30.6%), but can also be combined with various MT methods. Commercial translators such as Google Translate (36.4%) and automatic validation methods such as BLEU (22.2%) are frequently used tools for translation and subsequent validation.

Keywords: Controlled vocabulary; Machine translation; NLP.

PubMed Disclaimer

Conflict of interest statement

Declaration of Competing Interest The authors declare that there are no conflicts of interests.

Publication types

LinkOut - more resources