Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jun 14;5(1):75.
doi: 10.1038/s41746-022-00620-x.

Harmonization and standardization of data for a pan-European cohort on SARS- CoV-2 pandemic

Affiliations

Harmonization and standardization of data for a pan-European cohort on SARS- CoV-2 pandemic

Eugenia Rinaldi et al. NPJ Digit Med. .

Abstract

The European project ORCHESTRA intends to create a new pan-European cohort to rapidly advance the knowledge of the effects and treatment of COVID-19. Establishing processes that facilitate the merging of heterogeneous clusters of retrospective data was an essential challenge. In addition, data from new ORCHESTRA prospective studies have to be compatible with earlier collected information to be efficiently combined. In this article, we describe how we utilized and contributed to existing standard terminologies to create consistent semantic representation of over 2500 COVID-19-related variables taken from three ORCHESTRA studies. The goal is to enable the semantic interoperability of data within the existing project studies and to create a common basis of standardized elements available for the design of new COVID-19 studies. We also identified 743 variables that were commonly used in two of the three prospective ORCHESTRA studies and can therefore be directly combined for analysis purposes. Additionally, we actively contributed to global interoperability by submitting new concept requests to the terminology Standards Development Organizations.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Excerpts of both the Long-Term Sequelae and Fragile population data dictionaries and selected Genomics variables.
Examples of data dictionary elements with standard terminology codes incorporated in the variable IDs and answer (choice) IDs are shown as well as additional semantic representations of a concept that were added in the Field Annotation column. ECOG PS Eastern Cooperative Oncology Group Performance Scale, HIV RNA Human immunodeficiency virus ribonucleic acid, ARV Antiretroviral, CT Computer tomography, CD3 + cells Cluster of differentiation 3 positive T-cells, CD4 + cells Cluster of differentiation 4 positive T-cells, CD19 + cells Cluster of differentiation 19 positive B-lymphocytes, IFN-gamma Interferon gamma, TNF-alpha Tumor necrosis factor alpha, IL-2 Interleukin 2, pg/mL Pictograms per milliliter, ID Identifier.
Fig. 2
Fig. 2. Excerpt of the Core Data Set.
Examples of common elements from the LTS and FP studies. CD38 + cells Cluster of differentiation 38 positive immune cells, IL-13 Interleukin 13, PCR Polymerase chain reaction.
Fig. 3
Fig. 3. Unique standard codes.
Overview of unique codes from recognized international standard terminologies and classifications assigned to common variables used in the LTS and FP studies’ electronic Case Report Forms.
Fig. 4
Fig. 4. Overview of harmonized data and submissions to standard developing organizations.
The diagram shows a summary of the variables used and semantically coded in the case report forms of the LTS and FP clinical studies and the concepts submitted for coding to standard developing organizations.
Fig. 5
Fig. 5. Standardization and harmonization workflow.
The diagram shows the different steps of the standardization and harmonization process in ORCHESTRA.
Fig. 6
Fig. 6. Standard Terminologies.
Overview of the main terminologies used to code ORCHESTRA variables to ensure semantic interoperability.
Fig. 7
Fig. 7. Assignment of standard terminology codes to variable and answer IDs.
a Assignment of SNOMED CT codes to represent the clinical concept of the question in the variable ID and the concepts contained in the answers as codes in the answer IDs. b Assignment of appropriate LOINC code representing the laboratory value lactate dehydrogenase to the variable ID of the respective question in the data dictionary.
Fig. 8
Fig. 8. Incorporation of suffixes into the standardized variable names of data used in the Long-Term Sequelae and Fragile Population studies.
a Overview of suffixes used as part of the variable names for the laboratory component creatine kinase which was coded with the appropriate LOINC code. b Overview of suffixes added to the ATC code for dexamethasone as part of the variable names of the related data elements. U/L: Unit per liter, nkat/L: Nanokatal per liter, kat/L: Katal per liter, IU/L: International unit per liter.
Fig. 9
Fig. 9. Harmonization of value sets for two common variables.
a, b show how different answer value sets between two clinical studies in ORCHESTRA converged to maximize precision and interoperability.

References

    1. IEEE Standard Computer Dictionary: A Compilation of IEEE Standard Computer Glossaries. IEEE Std610 1–217 (1991) 10.1109/IEEESTD.1991.106963.
    1. Solle D. Be FAIR to your data. Anal. Bioanal. Chem. 2020;412:3961–3965. doi: 10.1007/s00216-020-02526-7. - DOI - PMC - PubMed
    1. Dugas M, et al. Portal of medical data models: information infrastructure for medical research and healthcare. Database J. Biol. Databases Curation. 2016;2016:bav121. - PMC - PubMed
    1. Kim HH, Park YR, Lee S, Kim JH. Composite CDE: modeling composite relationships between common data elements for representing complex clinical data. BMC Med. Inform. Decis. Mak. 2020;20:147. doi: 10.1186/s12911-020-01168-0. - DOI - PMC - PubMed
    1. Sass, J. et al. The German Corona Consensus Dataset (GECCO): a standardized dataset for COVID-19 research in university medicine and beyond. BMCMed. Inf. Decis Mak20, (2020). - PMC - PubMed