Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2019 Nov 15;8(1):278.
doi: 10.1186/s13643-019-1222-2.

Performance and usability of machine learning for screening in systematic reviews: a comparative evaluation of three tools

Affiliations
Comparative Study

Performance and usability of machine learning for screening in systematic reviews: a comparative evaluation of three tools

Allison Gates et al. Syst Rev. .

Abstract

Background: We explored the performance of three machine learning tools designed to facilitate title and abstract screening in systematic reviews (SRs) when used to (a) eliminate irrelevant records (automated simulation) and (b) complement the work of a single reviewer (semi-automated simulation). We evaluated user experiences for each tool.

Methods: We subjected three SRs to two retrospective screening simulations. In each tool (Abstrackr, DistillerSR, RobotAnalyst), we screened a 200-record training set and downloaded the predicted relevance of the remaining records. We calculated the proportion missed and workload and time savings compared to dual independent screening. To test user experiences, eight research staff tried each tool and completed a survey.

Results: Using Abstrackr, DistillerSR, and RobotAnalyst, respectively, the median (range) proportion missed was 5 (0 to 28) percent, 97 (96 to 100) percent, and 70 (23 to 100) percent for the automated simulation and 1 (0 to 2) percent, 2 (0 to 7) percent, and 2 (0 to 4) percent for the semi-automated simulation. The median (range) workload savings was 90 (82 to 93) percent, 99 (98 to 99) percent, and 85 (85 to 88) percent for the automated simulation and 40 (32 to 43) percent, 49 (48 to 49) percent, and 35 (34 to 38) percent for the semi-automated simulation. The median (range) time savings was 154 (91 to 183), 185 (95 to 201), and 157 (86 to 172) hours for the automated simulation and 61 (42 to 82), 92 (46 to 100), and 64 (37 to 71) hours for the semi-automated simulation. Abstrackr identified 33-90% of records missed by a single reviewer. RobotAnalyst performed less well and DistillerSR provided no relative advantage. User experiences depended on user friendliness, qualities of the user interface, features and functions, trustworthiness, ease and speed of obtaining predictions, and practicality of the export file(s).

Conclusions: The workload savings afforded in the automated simulation came with increased risk of missing relevant records. Supplementing a single reviewer's decisions with relevance predictions (semi-automated simulation) sometimes reduced the proportion missed, but performance varied by tool and SR. Designing tools based on reviewers' self-identified preferences may improve their compatibility with present workflows.

Systematic review registration: Not applicable.

Keywords: Automation; Machine learning; Systematic reviews; Usability; User experience.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Proportion missed (percent) by tool and systematic review, automated simulation
Fig. 2
Fig. 2
Workload savings (percent) by tool and systematic review, automated simulation
Fig. 3
Fig. 3
Estimated time savings (days) by tool and systematic review, automated simulation
Fig. 4
Fig. 4
Proportion missed (percent) by tool and systematic review, semi-automated simulation
Fig. 5
Fig. 5
Workload savings (percent) by tool and systematic review, semi-automated simulation
Fig. 6
Fig. 6
Estimated time savings (days) by tool and systematic review, semi-automated simulation

References

    1. Borah R, Brown AW, Capers PL, Kaiser KA. Analysis of the time and workers needed to conduct systematic reviews of medical interventions using data from the PROSPERO registry. BMJ Open. 2017;7:e012545. doi: 10.1136/bmjopen-2016-012545. - DOI - PMC - PubMed
    1. Thomas J, McNaught J, Ananiadou S. Applications of text mining within systematic reviews. Res Synth Methods. 2011;2:1–14. 10.1002/jrsm.27. - PubMed
    1. Tsafnat G, Glasziou P, Choong MK, Dunn A, Galgani F, Coiera E. Systematic review automation technologies. Syst Rev. 2014;3:74. doi: 10.1186/2046-4053-3-74. - DOI - PMC - PubMed
    1. Beller E, Clark J, Tsafnat G, Adams C, Diehl H, Lund H, et al. Making progress with the automation of systematic reviews: principles of the International Collaboration for the Automation of Systematic Reviews (ICASR) Syst Rev. 2018;7:77. doi: 10.1186/s13643-018-0740-7. - DOI - PMC - PubMed
    1. Marshall IJ, Wallace BC. Toward systematic review automation: a practical guide to using ML tools in research synthesis. Syst Rev. 2019;8:163. doi: 10.1186/s13643-019-1074-9. - DOI - PMC - PubMed

Publication types