Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 May 23:5:87.
doi: 10.1186/s13643-016-0263-z.

SWIFT-Review: a text-mining workbench for systematic review

Affiliations

SWIFT-Review: a text-mining workbench for systematic review

Brian E Howard et al. Syst Rev. .

Abstract

Background: There is growing interest in using machine learning approaches to priority rank studies and reduce human burden in screening literature when conducting systematic reviews. In addition, identifying addressable questions during the problem formulation phase of systematic review can be challenging, especially for topics having a large literature base. Here, we assess the performance of the SWIFT-Review priority ranking algorithm for identifying studies relevant to a given research question. We also explore the use of SWIFT-Review during problem formulation to identify, categorize, and visualize research areas that are data rich/data poor within a large literature corpus.

Methods: Twenty case studies, including 15 public data sets, representing a range of complexity and size, were used to assess the priority ranking performance of SWIFT-Review. For each study, seed sets of manually annotated included and excluded titles and abstracts were used for machine training. The remaining references were then ranked for relevance using an algorithm that considers term frequency and latent Dirichlet allocation (LDA) topic modeling. This ranking was evaluated with respect to (1) the number of studies screened in order to identify 95 % of known relevant studies and (2) the "Work Saved over Sampling" (WSS) performance metric. To assess SWIFT-Review for use in problem formulation, PubMed literature search results for 171 chemicals implicated as EDCs were uploaded into SWIFT-Review (264,588 studies) and categorized based on evidence stream and health outcome. Patterns of search results were surveyed and visualized using a variety of interactive graphics.

Results: Compared with the reported performance of other tools using the same datasets, the SWIFT-Review ranking procedure obtained the highest scores on 11 out of 15 of the public datasets. Overall, these results suggest that using machine learning to triage documents for screening has the potential to save, on average, more than 50 % of the screening effort ordinarily required when using un-ordered document lists. In addition, the tagging and annotation capabilities of SWIFT-Review can be useful during the activities of scoping and problem formulation.

Conclusions: Text-mining and machine learning software such as SWIFT-Review can be valuable tools to reduce the human screening burden and assist in problem formulation.

Keywords: Literature prioritization; SWIFT-Review; Scoping reports; Software; Systematic review.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
“Work Saved over Sampling” (WSS) performance metric. The dotted black line illustrates the expected recall achieved when traversing a randomly ordered list. Similarly, the blue line shows the recall obtained when traversing a (hypothetical) ranked list. The length of the dotted red line indicates the percent reduction in effort achieved by ranking and corresponds to the WSS at 95 % recall, in this case, approximately 15 % (95–80 %)
Fig. 2
Fig. 2
SWIFT-Review user interface and tag browser. The SWIFT-Review Tag Browser allows users to interactively filter a literature set by selecting various combinations of “tags” that have been automatically and/or manually applied to the corpus. In this case, the user has selected for investigation research articles in the “Neoplasms” health outcome category; terms in each abstract that are related to these tags are highlighted automatically in the Document Preview panel
Fig. 3
Fig. 3
Learning curves. The graphs above show that, as expected, performance of the prioritization method on each dataset is an increasing function of training set size. Since the total number of available positive instances varies significantly between datasets, not all sizes could be tested for each dataset
Fig. 4
Fig. 4
Performance of ranking algorithm on five datasets: Transgenerational, BPA, PFOS/PFOA, Neuropain: N = 100 [50 included; 50 excluded.]; Fluoride: N = 60 [30 Included; 30 Excluded.]) In all cases, the ranking algorithm results in a substantial potential reduction in screening effort compared to random ordering, with WSS@95 scores ranging from about 60 % (neuropain) to 90 % (fluoride)
Fig. 5
Fig. 5
Observed changes in WSS@95 attributable to three feature types. a LDA, b MeSH terms, and c N-grams on 20 SR datasets. Mean changes in WSS@95 were 4.4 % (LDA), 1 % (MeSH), and −0. 4 % (NGrams). In each case, performance was measured on each of the 20 datasets both with and without the specified feature type. The resulting WSS@95 differences for each dataset were averaged over 25 trials. As shown in a, adding LDA features to the ranking algorithm can result in significant performance increases, whereas inclusion of the MeSH and NGram features (b and c) were not found to result in large additional benefits when the remaining feature types were also included
Fig. 6
Fig. 6
Study Flow diagram for the analysis of 171 UNEP EDC chemicals. The literature search identified 221,898 recent research articles out of the total 709,573 EDC articles retrieved from searching PubMed
Fig. 7
Fig. 7
Interactively exploring arsenic in the EDC scoping report in SWIFT-Review. This example shows a pie chart survey of health outcomes represented among the 2400 studies on arsenic with a MeSH disease code. In this pie chart, studies lacking a MeSH disease code are not displayed (9553 of the 11,953 documents retrieved for arsenic) and documents may appear in multiple health outcome categories. Below the pie-chart is a list of 1342 documents relevant to “arsenic and neoplasms”. Inset Using the interactive browser, users can “drill down” to further explore documents in a specific area, e.g., arsenic and neoplasms, based on other tags such as evidence stream (human, animal, in vitro)
Fig. 8
Fig. 8
Survey of types of chemicals associated with female urogenital disease and pregnancy. The current example uses a pie chart graphic to survey the types of stressors (e.g., pesticides, drugs of abuse, diet and nutrition) associated with the health outcome of female urogenital disease and pregnancy. Below the pie chart is a list of 611 documents retrieved as part of the “pesticides” filter within SWIFT-Review and a bar chart of the most common Tox21 chemicals referenced in the pesticides cluster. Note that bisphenol A is not a pesticide but appears on this list because it was frequently mentioned in the pesticide studies
Fig. 9
Fig. 9
Excerpt of a heat map displaying search results for the 171 EDC chemicals categorized by health outcomes. The numbers displayed indicate the number of SWIFT-Review records matching each combination of chemical (rows) and health outcomes (columns). “Pockets” with larger numbers of matching records are displayed in red color
Fig. 10
Fig. 10
Topic models bar chart from the EDC scoping report. Topic modeling is an unsupervised clustering technique that can often automatically “discover” the main themes in an unlabeled literature corpus. For example, in the case of the EDC literature set, several interesting topics are shown above including topics related to BPA exposure during pregnancy (topic 13), analytical methods used to measure levels of EDCs (topic 23), estrogen, expression, and receptors (topic 7), lead and arsenic exposure (topic 31), breast and prostate cancer (topic 26), and thyroid disease (topic 15). Within SWIFT-Review, users can select any of these topics to interactively browse the associated documents

References

    1. Bastian H, Glasziou P, Chalmers I. Seventy-five trials and eleven systematic reviews a day: how will we ever keep up? PLoS Med. 2010;7(9):e1000326. doi: 10.1371/journal.pmed.1000326. - DOI - PMC - PubMed
    1. Ganann R, Ciliska D, Thomas H. Expediting systematic reviews: methods and implications of rapid reviews. Implement Sci. 2010;5(1):56. doi: 10.1186/1748-5908-5-56. - DOI - PMC - PubMed
    1. Jonnalagadda SR, Goyal P, Huffman MD. Automating data extraction in systematic reviews: a systematic review. Syst Rev. 2015;4(1):78. doi: 10.1186/s13643-015-0066-7. - DOI - PMC - PubMed
    1. O’Mara-Eves A, Thomas J, McNaught J, Miwa M, Ananiadou S. Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst Rev. 2015;4(1):5. doi: 10.1186/2046-4053-4-5. - DOI - PMC - PubMed
    1. Colquhoun HL, Levac D, O’Brien KK, Straus S, Tricco AC, Perrier L, Kastner M, Moher D. Scoping reviews: time for clarity in definition, methods, and reporting. J Clin Epidemiol. 2014;67(12):1291–4. doi: 10.1016/j.jclinepi.2014.03.013. - DOI - PubMed

Publication types

LinkOut - more resources