. 2024 Jan;15(1):73-85.

doi: 10.1002/jrsm.1672. Epub 2023 Sep 25.

A comparison of machine learning methods to find clinical trials for inclusion in new systematic reviews from their PROSPERO registrations prior to searching and screening

Shifeng Liu¹, Florence T Bourgeois^{2

3}, Claire Narang², Adam G Dunn^{1

2}

Affiliations

¹ Biomedical Informatics and Digital Health, Faculty of Medicine and Health, The University of Sydney, Sydney, New South Wales, Australia.
² Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA.
³ Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA.

PMID: 37749068
PMCID: PMC10872991
DOI: 10.1002/jrsm.1672

A comparison of machine learning methods to find clinical trials for inclusion in new systematic reviews from their PROSPERO registrations prior to searching and screening

Shifeng Liu et al. Res Synth Methods. 2024 Jan.

. 2024 Jan;15(1):73-85.

doi: 10.1002/jrsm.1672. Epub 2023 Sep 25.

Authors

Shifeng Liu¹, Florence T Bourgeois^{2

3}, Claire Narang², Adam G Dunn^{1

2}

Affiliations

¹ Biomedical Informatics and Digital Health, Faculty of Medicine and Health, The University of Sydney, Sydney, New South Wales, Australia.
² Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA.
³ Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA.

PMID: 37749068
PMCID: PMC10872991
DOI: 10.1002/jrsm.1672

Abstract

Searching for trials is a key task in systematic reviews and a focus of automation. Previous approaches required knowing examples of relevant trials in advance, and most methods are focused on published trial articles. To complement existing tools, we compared methods for finding relevant trial registrations given a International Prospective Register of Systematic Reviews (PROSPERO) entry and where no relevant trials have been screened for inclusion in advance. We compared SciBERT-based (extension of Bidirectional Encoder Representations from Transformers) PICO extraction, MetaMap, and term-based representations using an imperfect dataset mined from 3632 PROSPERO entries connected to a subset of 65,662 trial registrations and 65,834 trial articles known to be included in systematic reviews. Performance was measured by the median rank and recall by rank of trials that were eventually included in the published systematic reviews. When ranking trial registrations relative to PROSPERO entries, 296 trial registrations needed to be screened to identify half of the relevant trials, and the best performing approach used a basic term-based representation. When ranking trial articles relative to PROSPERO entries, 162 trial articles needed to be screened to identify half of the relevant trials, and the best-performing approach used a term-based representation. The results show that MetaMap and term-based representations outperformed approaches that included PICO extraction for this use case. The results suggest that when starting with a PROSPERO entry and where no trials have been screened for inclusion, automated methods can reduce workload, but additional processes are still needed to efficiently identify trial registrations or trial articles that meet the inclusion criteria of a systematic review.

Keywords: clinical trials; information retrieval; systematic reviews.

PubMed Disclaimer

Conflict of interest statement

CONFLICTS OF INTEREST

The authors declare that there is no potential conflict of interest.

Figures

**Figure 1.**
A schematic representation of the traditional systematic review of clinical trials compared to proactively allocating trials to systematic review questions before the trials are completed (left), and the types of tools used to support automation (right), including: (a) active learning approaches where experts label trials during model training; (b) proactive identification of trial registrations and trial articles to update systematic reviews where some included trials are already known; and (c) proactive identification of trial registrations and trial articles for new systematic review questions where experts are not available to identify example trials before or during model training.

**Figure 2.**
An example of the comparison framework using mined data for ranking trial registrations to PROSPERO entries, including the steps of processing (grey), numbers of trial registrations (green), PROSPERO entries (blue), and known connections (red). Other combinations of PROSPERO entries, systematic review articles, trial articles, and trial registrations are included in the Supplementary Material.

**Figure 3.**
The scatter plot of the TF-IDF methods using different targeted terms with cosine similarity as ranking score in the mined data for: (a) PROSPERO entry to trial registration; (b) PROSPERO entry to trial article; (c) systematic review article to trial registration; and (d) systematic review article to trial article.

**Figure 4.**
The Scatter plot of the TF-IDF methods using different targeted terms with cosine similarity as ranking score in manually curated data for: (a) PROSPERO entry to trial registration; (b) PROSPERO entry to trial article; (c) systematic review article to trial registration; and (d) systematic review article to trial article.

See this image and copyright information in PMC

References

1. Pham B, Bagheri E, Rios P, Pourmasoumi A, Robson RC, Hwee J, et al. Improving the conduct of systematic reviews: a process mining perspective. Journal of Clinical Epidemiology. 2018. Nov;103:101–11. - PubMed
1. Page MJ, Moher D. Mass Production of Systematic Reviews and Meta-analyses: An Exercise in Mega-silliness?: Commentary: Mass Production of Systematic Reviews and Meta-analyses. The Milbank Quarterly. 2016. Sep;94(3):515–9. - PMC - PubMed
1. Dunn AG, Bourgeois FT. Is it time for computable evidence synthesis? Journal of the American Medical Informatics Association. 2020. Jun 1;27(6):972–5. - PMC - PubMed
1. Pieper D, Antoine SL, Neugebauer EAM, Eikermann M. Up-to-dateness of reviews is often neglected in overviews: a systematic review. Journal of Clinical Epidemiology. 2014. Dec;67(12):1302–8. - PubMed
1. Zarin DA, Tse T. Sharing Individual Participant Data (IPD) within the Context of the Trial Reporting System (TRS). PLoS Med. 2016. Jan 19;13(1):e1001946. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions

Grants and funding

R01 LM012976/LM/NLM NIH HHS/United States

LinkOut - more resources

Full Text Sources
- PubMed Central
- Wiley
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A comparison of machine learning methods to find clinical trials for inclusion in new systematic reviews from their PROSPERO registrations prior to searching and screening

Affiliations

A comparison of machine learning methods to find clinical trials for inclusion in new systematic reviews from their PROSPERO registrations prior to searching and screening

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical

Abstract

Conflict of interest statement

Figures

Similar articles

References

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Medical