Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 May;61(1-02):3-10.
doi: 10.1055/s-0041-1739361. Epub 2021 Nov 24.

A Semi-Automated Term Harmonization Pipeline Applied to Pulmonary Arterial Hypertension Clinical Trials

Affiliations

A Semi-Automated Term Harmonization Pipeline Applied to Pulmonary Arterial Hypertension Clinical Trials

Ryan J Urbanowicz et al. Methods Inf Med. 2022 May.

Abstract

Objective: Data harmonization is essential to integrate individual participant data from multiple sites, time periods, and trials for meta-analysis. The process of mapping terms and phrases to an ontology is complicated by typographic errors, abbreviations, truncation, and plurality. We sought to harmonize medical history (MH) and adverse events (AE) term records across 21 randomized clinical trials in pulmonary arterial hypertension and chronic thromboembolic pulmonary hypertension.

Methods: We developed and applied a semi-automated harmonization pipeline for use with domain-expert annotators to resolve ambiguous term mappings using exact and fuzzy matching. We summarized MH and AE term mapping success, including map quality measures, and imputation of a generalizing term hierarchy as defined by the applied Medical Dictionary for Regulatory Activities (MedDRA) ontology standard.

Results: Over 99.6% of both MH (N = 37,105) and AE (N = 58,170) records were successfully mapped to MedDRA low-level terms. Automated exact matching accounted for 74.9% of MH and 85.5% of AE mappings. Term recommendations from fuzzy matching in the pipeline facilitated annotator mapping of the remaining 24.9% of MH and 13.8% of AE records. Imputation of the generalized MedDRA term hierarchy was unambiguous in 85.2% of high-level terms, 99.4% of high-level group terms, and 99.5% of system organ class in MH, and 75% of high-level terms, 98.3% of high-level group terms, and 98.4% of system organ class in AE.

Conclusion: This pipeline dramatically reduced the burden of manual annotation for MH and AE term harmonization and could be adapted to other data integration efforts.

PubMed Disclaimer

Conflict of interest statement

None declared.

References

    1. Maxwell SE, Kelley K, Rausch JR. Sample size planning for statistical power and accuracy in parameter estimation. Annu Rev Psychol 2008;59(01):537–563 - PubMed
    1. Lachin JM. Introduction to sample size determination and power analysis for clinical trials. Control Clin Trials 1981;2(02):93–113 - PubMed
    1. Evans JDW, Girerd B, Montani D, et al. BMPR2 mutations and survival in pulmonary arterial hypertension: an individual participant data meta-analysis. Lancet Respir Med 2016;4(02):129–137 - PMC - PubMed
    1. Halliday SJ, Hemnes AR. Identifying “super responders” in pulmonary arterial hypertension. Pulm Circ 2017;7(02):300–311 - PMC - PubMed
    1. Lee JS-H, Kibbe WA, Grossman RL. Data harmonization for a molecularly driven health system. Cell 2018;174(05):1045–1048 - PubMed

MeSH terms