Comparative Study

. 2019 Nov 15;8(1):278.

doi: 10.1186/s13643-019-1222-2.

Performance and usability of machine learning for screening in systematic reviews: a comparative evaluation of three tools

Allison Gates¹, Samantha Guitard¹, Jennifer Pillay¹, Sarah A Elliott¹, Michele P Dyson¹, Amanda S Newton², Lisa Hartling³

Affiliations

¹ Department of Pediatrics, Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, University of Alberta, 11405 87 Ave NW, Edmonton, Alberta, T6G 1C9, Canada.
² Department of Pediatrics, University of Alberta Evidence-based Practice Center, University of Alberta, 11405 87 Ave NW, Edmonton, Alberta, T6G 1C9, Canada.
³ Department of Pediatrics, Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, University of Alberta, 11405 87 Ave NW, Edmonton, Alberta, T6G 1C9, Canada. hartling@ualberta.ca.

PMID: 31727150
PMCID: PMC6857345
DOI: 10.1186/s13643-019-1222-2

Comparative Study

Performance and usability of machine learning for screening in systematic reviews: a comparative evaluation of three tools

Allison Gates et al. Syst Rev. 2019.

. 2019 Nov 15;8(1):278.

doi: 10.1186/s13643-019-1222-2.

Authors

Allison Gates¹, Samantha Guitard¹, Jennifer Pillay¹, Sarah A Elliott¹, Michele P Dyson¹, Amanda S Newton², Lisa Hartling³

Affiliations

¹ Department of Pediatrics, Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, University of Alberta, 11405 87 Ave NW, Edmonton, Alberta, T6G 1C9, Canada.
² Department of Pediatrics, University of Alberta Evidence-based Practice Center, University of Alberta, 11405 87 Ave NW, Edmonton, Alberta, T6G 1C9, Canada.
³ Department of Pediatrics, Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, University of Alberta, 11405 87 Ave NW, Edmonton, Alberta, T6G 1C9, Canada. hartling@ualberta.ca.

PMID: 31727150
PMCID: PMC6857345
DOI: 10.1186/s13643-019-1222-2

Abstract

Background: We explored the performance of three machine learning tools designed to facilitate title and abstract screening in systematic reviews (SRs) when used to (a) eliminate irrelevant records (automated simulation) and (b) complement the work of a single reviewer (semi-automated simulation). We evaluated user experiences for each tool.

Methods: We subjected three SRs to two retrospective screening simulations. In each tool (Abstrackr, DistillerSR, RobotAnalyst), we screened a 200-record training set and downloaded the predicted relevance of the remaining records. We calculated the proportion missed and workload and time savings compared to dual independent screening. To test user experiences, eight research staff tried each tool and completed a survey.

Results: Using Abstrackr, DistillerSR, and RobotAnalyst, respectively, the median (range) proportion missed was 5 (0 to 28) percent, 97 (96 to 100) percent, and 70 (23 to 100) percent for the automated simulation and 1 (0 to 2) percent, 2 (0 to 7) percent, and 2 (0 to 4) percent for the semi-automated simulation. The median (range) workload savings was 90 (82 to 93) percent, 99 (98 to 99) percent, and 85 (85 to 88) percent for the automated simulation and 40 (32 to 43) percent, 49 (48 to 49) percent, and 35 (34 to 38) percent for the semi-automated simulation. The median (range) time savings was 154 (91 to 183), 185 (95 to 201), and 157 (86 to 172) hours for the automated simulation and 61 (42 to 82), 92 (46 to 100), and 64 (37 to 71) hours for the semi-automated simulation. Abstrackr identified 33-90% of records missed by a single reviewer. RobotAnalyst performed less well and DistillerSR provided no relative advantage. User experiences depended on user friendliness, qualities of the user interface, features and functions, trustworthiness, ease and speed of obtaining predictions, and practicality of the export file(s).

Conclusions: The workload savings afforded in the automated simulation came with increased risk of missing relevant records. Supplementing a single reviewer's decisions with relevance predictions (semi-automated simulation) sometimes reduced the proportion missed, but performance varied by tool and SR. Designing tools based on reviewers' self-identified preferences may improve their compatibility with present workflows.

Systematic review registration: Not applicable.

Keywords: Automation; Machine learning; Systematic reviews; Usability; User experience.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

**Fig. 1**
Proportion missed (percent) by tool and systematic review, automated simulation

**Fig. 2**
Workload savings (percent) by tool and systematic review, automated simulation

**Fig. 3**
Estimated time savings (days) by tool and systematic review, automated simulation

**Fig. 4**
Proportion missed (percent) by tool and systematic review, semi-automated simulation

**Fig. 5**
Workload savings (percent) by tool and systematic review, semi-automated simulation

**Fig. 6**
Estimated time savings (days) by tool and systematic review, semi-automated simulation

See this image and copyright information in PMC

References

1. Borah R, Brown AW, Capers PL, Kaiser KA. Analysis of the time and workers needed to conduct systematic reviews of medical interventions using data from the PROSPERO registry. BMJ Open. 2017;7:e012545. doi: 10.1136/bmjopen-2016-012545. - DOI - PMC - PubMed
1. Thomas J, McNaught J, Ananiadou S. Applications of text mining within systematic reviews. Res Synth Methods. 2011;2:1–14. 10.1002/jrsm.27. - PubMed
1. Tsafnat G, Glasziou P, Choong MK, Dunn A, Galgani F, Coiera E. Systematic review automation technologies. Syst Rev. 2014;3:74. doi: 10.1186/2046-4053-3-74. - DOI - PMC - PubMed
1. Beller E, Clark J, Tsafnat G, Adams C, Diehl H, Lund H, et al. Making progress with the automation of systematic reviews: principles of the International Collaboration for the Automation of Systematic Reviews (ICASR) Syst Rev. 2018;7:77. doi: 10.1186/s13643-018-0740-7. - DOI - PMC - PubMed
1. Marshall IJ, Wallace BC. Toward systematic review automation: a practical guide to using ML tools in research synthesis. Syst Rev. 2019;8:163. doi: 10.1186/s13643-019-1074-9. - DOI - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

290-2015-00001-I/Agency for Healthcare Research and Quality/International

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Performance and usability of machine learning for screening in systematic reviews: a comparative evaluation of three tools

Affiliations

Performance and usability of machine learning for screening in systematic reviews: a comparative evaluation of three tools

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials