Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015;54(2):164-70.
doi: 10.3414/ME13-01-0130. Epub 2014 Oct 20.

Adaptive semantic tag mining from heterogeneous clinical research texts

Affiliations

Adaptive semantic tag mining from heterogeneous clinical research texts

T Hao et al. Methods Inf Med. 2015.

Abstract

Objectives: To develop an adaptive approach to mine frequent semantic tags (FSTs) from heterogeneous clinical research texts.

Methods: We develop a "plug-n-play" framework that integrates replaceable unsupervised kernel algorithms with formatting, functional, and utility wrappers for FST mining. Temporal information identification and semantic equivalence detection were two example functional wrappers. We first compared this approach's recall and efficiency for mining FSTs from ClinicalTrials.gov to that of a recently published tag-mining algorithm. Then we assessed this approach's adaptability to two other types of clinical research texts: clinical data requests and clinical trial protocols, by comparing the prevalence trends of FSTs across three texts.

Results: Our approach increased the average recall and speed by 12.8% and 47.02% respectively upon the baseline when mining FSTs from ClinicalTrials.gov, and maintained an overlap in relevant FSTs with the base- line ranging between 76.9% and 100% for varying FST frequency thresholds. The FSTs saturated when the data size reached 200 documents. Consistent trends in the prevalence of FST were observed across the three texts as the data size or frequency threshold changed.

Conclusions: This paper contributes an adaptive tag-mining framework that is scalable and adaptable without sacrificing its recall. This component-based architectural design can be potentially generalizable to improve the adaptability of other clinical text mining methods.

Keywords: Medical informatics; clinical trials; component-based architecture; semantic tags; text mining.

PubMed Disclaimer

Figures

Fig 1
Fig 1
Changes in the counts of relevant FSTs as the frequency threshold grows in the three texts.
Fig 2
Fig 2
Changes in the percentage of the retrieved relevant FSTs over maximum retrieved relevant FSTs as the data size grew from 200 to 500 clinical trials summaries, clinical research protocol paragraphs, and clinical data requests, respectively.

Similar articles

Cited by

References

    1. López-Paz D, Hernández-Lobato JM, Schölkopf B. Semi-Supervised Domain Adaptation with Non-Parametric Copulas. In: Bartlett PL, et al., editors. NIPS. 2012. pp. 674–682.
    1. Tarvainen P. Adaptability Evaluation of Software Architectures; A Case Study. 31st Annual International Computer Software and Applications Conference; 2007.
    1. Benveniste A, Metivier M, Priouret P. Adaptive Algorithms and Stochastic Approximations. Springer Publishing Company, Incorporated; 2012. p. 376.
    1. Flora S Tsai, A TK, Wenyin HS Tang, Kap Luk Chan. Adaptable Services for Novelty Mining. Systems and Service-Oriented Engineering. 2010;1(2):17.
    1. Xu Q, Quan Y, Yang L, He J. An adaptive algorithm for the determination of the onset and offset of muscle contraction by EMG signal processing. IEEE Trans Neural Syst Rehabil Eng. 2013;21(1):65–73. - PubMed

Publication types