Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jan;15(1):73-85.
doi: 10.1002/jrsm.1672. Epub 2023 Sep 25.

A comparison of machine learning methods to find clinical trials for inclusion in new systematic reviews from their PROSPERO registrations prior to searching and screening

Affiliations

A comparison of machine learning methods to find clinical trials for inclusion in new systematic reviews from their PROSPERO registrations prior to searching and screening

Shifeng Liu et al. Res Synth Methods. 2024 Jan.

Abstract

Searching for trials is a key task in systematic reviews and a focus of automation. Previous approaches required knowing examples of relevant trials in advance, and most methods are focused on published trial articles. To complement existing tools, we compared methods for finding relevant trial registrations given a International Prospective Register of Systematic Reviews (PROSPERO) entry and where no relevant trials have been screened for inclusion in advance. We compared SciBERT-based (extension of Bidirectional Encoder Representations from Transformers) PICO extraction, MetaMap, and term-based representations using an imperfect dataset mined from 3632 PROSPERO entries connected to a subset of 65,662 trial registrations and 65,834 trial articles known to be included in systematic reviews. Performance was measured by the median rank and recall by rank of trials that were eventually included in the published systematic reviews. When ranking trial registrations relative to PROSPERO entries, 296 trial registrations needed to be screened to identify half of the relevant trials, and the best performing approach used a basic term-based representation. When ranking trial articles relative to PROSPERO entries, 162 trial articles needed to be screened to identify half of the relevant trials, and the best-performing approach used a term-based representation. The results show that MetaMap and term-based representations outperformed approaches that included PICO extraction for this use case. The results suggest that when starting with a PROSPERO entry and where no trials have been screened for inclusion, automated methods can reduce workload, but additional processes are still needed to efficiently identify trial registrations or trial articles that meet the inclusion criteria of a systematic review.

Keywords: clinical trials; information retrieval; systematic reviews.

PubMed Disclaimer

Conflict of interest statement

CONFLICTS OF INTEREST

The authors declare that there is no potential conflict of interest.

Figures

Figure 1.
Figure 1.
A schematic representation of the traditional systematic review of clinical trials compared to proactively allocating trials to systematic review questions before the trials are completed (left), and the types of tools used to support automation (right), including: (a) active learning approaches where experts label trials during model training; (b) proactive identification of trial registrations and trial articles to update systematic reviews where some included trials are already known; and (c) proactive identification of trial registrations and trial articles for new systematic review questions where experts are not available to identify example trials before or during model training.
Figure 2.
Figure 2.
An example of the comparison framework using mined data for ranking trial registrations to PROSPERO entries, including the steps of processing (grey), numbers of trial registrations (green), PROSPERO entries (blue), and known connections (red). Other combinations of PROSPERO entries, systematic review articles, trial articles, and trial registrations are included in the Supplementary Material.
Figure 3.
Figure 3.
The scatter plot of the TF-IDF methods using different targeted terms with cosine similarity as ranking score in the mined data for: (a) PROSPERO entry to trial registration; (b) PROSPERO entry to trial article; (c) systematic review article to trial registration; and (d) systematic review article to trial article.
Figure 4.
Figure 4.
The Scatter plot of the TF-IDF methods using different targeted terms with cosine similarity as ranking score in manually curated data for: (a) PROSPERO entry to trial registration; (b) PROSPERO entry to trial article; (c) systematic review article to trial registration; and (d) systematic review article to trial article.

Similar articles

References

    1. Pham B, Bagheri E, Rios P, Pourmasoumi A, Robson RC, Hwee J, et al. Improving the conduct of systematic reviews: a process mining perspective. Journal of Clinical Epidemiology. 2018. Nov;103:101–11. - PubMed
    1. Page MJ, Moher D. Mass Production of Systematic Reviews and Meta-analyses: An Exercise in Mega-silliness?: Commentary: Mass Production of Systematic Reviews and Meta-analyses. The Milbank Quarterly. 2016. Sep;94(3):515–9. - PMC - PubMed
    1. Dunn AG, Bourgeois FT. Is it time for computable evidence synthesis? Journal of the American Medical Informatics Association. 2020. Jun 1;27(6):972–5. - PMC - PubMed
    1. Pieper D, Antoine SL, Neugebauer EAM, Eikermann M. Up-to-dateness of reviews is often neglected in overviews: a systematic review. Journal of Clinical Epidemiology. 2014. Dec;67(12):1302–8. - PubMed
    1. Zarin DA, Tse T. Sharing Individual Participant Data (IPD) within the Context of the Trial Reporting System (TRS). PLoS Med. 2016. Jan 19;13(1):e1001946. - PMC - PubMed

LinkOut - more resources