Extractive summarization of clinical trial descriptions

Christian Gulden¹, Melanie Kirchner², Christina Schüttler³, Marc Hinderer³, Marvin Kampf², Hans-Ulrich Prokosch³, Dennis Toddenroth³

Affiliations

¹ Medical Informatics, Friedrich-Alexander Universität Erlangen-Nürnberg (FAU), Erlangen, Germany. Electronic address: Christian.Gulden@fau.de.
² Medical Center for Information and Communication Technology, University Hospital Erlangen, Erlangen, Germany.
³ Medical Informatics, Friedrich-Alexander Universität Erlangen-Nürnberg (FAU), Erlangen, Germany.

PMID: 31445245
DOI: 10.1016/j.ijmedinf.2019.05.019

Extractive summarization of clinical trial descriptions

Christian Gulden et al. Int J Med Inform. 2019 Sep.

. 2019 Sep:129:114-121.

doi: 10.1016/j.ijmedinf.2019.05.019. Epub 2019 May 30.

Authors

Christian Gulden¹, Melanie Kirchner², Christina Schüttler³, Marc Hinderer³, Marvin Kampf², Hans-Ulrich Prokosch³, Dennis Toddenroth³

Affiliations

¹ Medical Informatics, Friedrich-Alexander Universität Erlangen-Nürnberg (FAU), Erlangen, Germany. Electronic address: Christian.Gulden@fau.de.
² Medical Center for Information and Communication Technology, University Hospital Erlangen, Erlangen, Germany.
³ Medical Informatics, Friedrich-Alexander Universität Erlangen-Nürnberg (FAU), Erlangen, Germany.

PMID: 31445245
DOI: 10.1016/j.ijmedinf.2019.05.019

Abstract

Purpose: Text summarization of clinical trial descriptions has the potential to reduce the time required to familiarize oneself with the subject of studies by condensing long-form detailed descriptions to concise, meaning-preserving synopses. This work describes the process and quality of automatically generated summaries of clinical trial descriptions using extractive text summarization methods.

Methods: We generated a novel dataset from the detailed descriptions and brief summaries of trials registered on clinicaltrials.gov. We executed several text summarization algorithms on the detailed descriptions in this corpus and calculated the standard ROUGE metrics using the brief summaries included in the record as a reference. To investigate the correlation of these metrics with human sentiments, four reviewers assessed the content-completeness of the generated summaries and the helpfulness of both the generated and reference summaries via a Likert scale questionnaire.

Results: The filtering stages of the dataset generation process reduce the 277,228 trials registered on clinicaltrials.gov to 101,016 records usable for the summarization task. On average, the summaries in this corpus are 25% the length of the detailed descriptions. Of the evaluated text summarization methods, the TextRank algorithm exhibits the overall best performance with a ROUGE-1 F1 score of 0.3531, ROUGE-2 F1 score of 0.1723, and ROUGE-L F1 score of 0.3003. These scores correlate with the assessment of the helpfulness and content similarity by the human reviewers. Inter-rater agreement for the helpfulness and content similarity was slight and fair respectively (Fleiss' kappa of 0.12 and 0.22).

Conclusions: Extractive summarization is a viable tool for generating meaning-preserving synopses of detailed clinical trial descriptions. Further, the human evaluation has shown that the ROUGE-L F1 score is useful for rating the general quality of generated summaries of clinical trial descriptions in an automated way.

Keywords: Clinical trials; NLP; Text mining; Text summarization.

PubMed Disclaimer

Publication types

Actions

MeSH terms

Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- ClinicalKey
- Elsevier Science
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Extractive summarization of clinical trial descriptions

Affiliations

Extractive summarization of clinical trial descriptions

Authors

Affiliations

Abstract

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Medical