Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Feb 11:2016:bav121.
doi: 10.1093/database/bav121. Print 2016.

Portal of medical data models: information infrastructure for medical research and healthcare

Affiliations

Portal of medical data models: information infrastructure for medical research and healthcare

Martin Dugas et al. Database (Oxford). .

Abstract

Introduction: Information systems are a key success factor for medical research and healthcare. Currently, most of these systems apply heterogeneous and proprietary data models, which impede data exchange and integrated data analysis for scientific purposes. Due to the complexity of medical terminology, the overall number of medical data models is very high. At present, the vast majority of these models are not available to the scientific community. The objective of the Portal of Medical Data Models (MDM, https://medical-data-models.org) is to foster sharing of medical data models.

Methods: MDM is a registered European information infrastructure. It provides a multilingual platform for exchange and discussion of data models in medicine, both for medical research and healthcare. The system is developed in collaboration with the University Library of Münster to ensure sustainability. A web front-end enables users to search, view, download and discuss data models. Eleven different export formats are available (ODM, PDF, CDA, CSV, MACRO-XML, REDCap, SQL, SPSS, ADL, R, XLSX). MDM contents were analysed with descriptive statistics.

Results: MDM contains 4387 current versions of data models (in total 10,963 versions). 2475 of these models belong to oncology trials. The most common keyword (n = 3826) is 'Clinical Trial'; most frequent diseases are breast cancer, leukemia, lung and colorectal neoplasms. Most common languages of data elements are English (n = 328,557) and German (n = 68,738). Semantic annotations (UMLS codes) are available for 108,412 data items, 2453 item groups and 35,361 code list items. Overall 335,087 UMLS codes are assigned with 21,847 unique codes. Few UMLS codes are used several thousand times, but there is a long tail of rarely used codes in the frequency distribution.

Discussion: Expected benefits of the MDM portal are improved and accelerated design of medical data models by sharing best practice, more standardised data models with semantic annotation and better information exchange between information systems, in particular Electronic Data Capture (EDC) and Electronic Health Records (EHR) systems. Contents of the MDM portal need to be further expanded to reach broad coverage of all relevant medical domains. Database URL: https://medical-data-models.org.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Search for clinical trial ‘AML-AZA’ on the MDM portal, resulting in six data models.
Figure 2
Figure 2
Laboratory data model from AML-AZA trial with hemoglobin, leukocytes and other parameters. Semantic codes and complete code lists for each data item are available in the detailed view.
Figure 3
Figure 3
Cumulative number of newly created data models (black graph) and updated data models (red graph) for the time period 2011–2015. In 2012, a draft set of ∼3000 models was uploaded into the portal. In 2015 ∼75% of data models were updated. In total 4387 data models were available.
Figure 4
Figure 4
Frequency distribution of data model versions. Most models were available in two (n  =  1295) or three (n  =  1357) versions. 13 models were provided in 10 or more versions.
Figure 5
Figure 5
UpSet plot of 10 most frequent keywords. The bar chart on the left indicates the frequency of keywords: ‘Clinical Trial’ is the most common keyword (almost 4000 occurences). The upper bar chart indicates the intersection size of keyword combinations. ‘Clinical Trial’ and ‘Eligibility Determination’ is the most frequent combination of keywords. The most common triple is ‘Clinical Trial’ – ‘Treatment Form’ – ‘Breast Cancer’.
Figure 6
Figure 6
Frequency distribution of 21 847 unique UMLS codes in the MDM portal. Few codes are used very often (>1000 fold), but there is a long tail of rarely used codes.
Figure 7
Figure 7
Frequency of UMLS codes per annotated element: median 1 (range 1–35). Overall 146 226 annotated elements (108 412 items, 2453 item groups and 35 361 code list items).

References

    1. AllTrials. http://www.alltrials.net/ (10 August 2015, date last accessed).
    1. SNOMED CT. http://www.ihtsdo.org/snomed-ct/ (10 August 2015, date last accessed, archived at http://www.webcitation.org/6agIgQreb).
    1. ClinicalTrials.gov. http://Clinicaltrials.gov (22 September 2015, date last accessed, archived at http://www.webcitation.org/6bjgR21MO).
    1. Getz K. Protocol Design Trends and their Effect on Clinical Trial Performance. http://csdd.tufts.edu/_documents/www/2816Getz.pdf (10 August 2015, date last accessed, archived at http://www.webcitation.org/6agJILUZv).
    1. International classification of diseases (ICD). http://www.who.int/classifications/icd/en/ (10 August 2015, date last accessed, archived at http://www.webcitation.org/6agIwgGXi).

Publication types

LinkOut - more resources