Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 May;63(1-02):52-61.
doi: 10.1055/s-0044-1786839. Epub 2024 May 13.

Europe's Largest Research Infrastructure for Curated Medical Data Models with Semantic Annotations

Affiliations

Europe's Largest Research Infrastructure for Curated Medical Data Models with Semantic Annotations

Sarah Riepenhausen et al. Methods Inf Med. 2024 May.

Abstract

Background: Structural metadata from the majority of clinical studies and routine health care systems is currently not yet available to the scientific community.

Objective: To provide an overview of available contents in the Portal of Medical Data Models (MDM Portal).

Methods: The MDM Portal is a registered European information infrastructure for research and health care, and its contents are curated and semantically annotated by medical experts. It enables users to search, view, discuss, and download existing medical data models.

Results: The most frequent keyword is "clinical trial" (n = 18,777), and the most frequent disease-specific keyword is "breast neoplasms" (n = 1,943). Most data items are available in English (n = 545,749) and German (n = 109,267). Manually curated semantic annotations are available for 805,308 elements (554,352 items, 58,101 item groups, and 192,855 code list items), which were derived from 25,257 data models. In total, 1,609,225 Unified Medical Language System (UMLS) codes have been assigned, with 66,373 unique UMLS codes.

Conclusion: To our knowledge, the MDM Portal constitutes Europe's largest collection of medical data models with semantically annotated elements. As such, it can be used to increase compatibility of medical datasets and can be utilized as a large expert-annotated medical text corpus for natural language processing.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Fig. 1
Fig. 1
Flow diagram of the annotation process of items from a medical form. Item labels are analyzed regarding medical concepts. Each concept is annotated semi-automatically: the system suggests a set of already used label-coding-combinations based on the search terms and sorted by fit and frequency. The user chooses the best fitting option or manually enters a different code. Pre-coordinated concept codes are given preference. If there are no suitable pre-coordinated options, two or more codes can be combined in post-coordination. CUI, concept unique identifier.
Fig. 2
Fig. 2
UpSet plot of the top 10 keywords assigned to data models. It indicates the most frequent combinations of keywords. For example, there are 1,452 models regarding eligibility determination in clinical trials dealing with cardiology. “Clinical trial” and “clinical trial” plus “eligibility determination” occur very frequently because of combinations with many different less-common keywords.
Fig. 3
Fig. 3
Time course of developing the MDM contents. MDM, Medical Data Model.
Fig. 4
Fig. 4
Screenshot from MDM portal. Search results for “heart failure” are displayed. MDM, Medical Data Model.

References

    1. Dugas M, Neuhaus P, Meidt A et al.Portal of medical data models: information infrastructure for medical research and healthcare. Database (Oxford) 2016;2016:bav121. - PMC - PubMed
    1. Wilkinson M D, Dumontier M, Aalbersberg I J et al.The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016;3:160018. - PMC - PubMed
    1. Dugas M, Jöckel K H, Friede T et al.Memorandum “Open Metadata”. Open access to documentation forms and item catalogs in healthcare. Methods Inf Med. 2015;54(04):376–378. - PubMed
    1. Völzke H, Alte D, Schmidt C O et al.Cohort profile: the study of health in Pomerania. Int J Epidemiol. 2011;40(02):294–307. - PubMed
    1. Kentgen M, Varghese J, Samol A, Waltenberger J, Dugas M. Common data elements for acute coronary syndrome: analysis based on the unified medical language system. JMIR Med Inform. 2019;7(03):e14107. - PMC - PubMed

Publication types

MeSH terms