. 2023 Jun 5;12(1):94.

doi: 10.1186/s13643-023-02247-9.

Ensemble of deep learning language models to support the creation of living systematic reviews for the COVID-19 literature

Julien Knafou¹, Quentin Haas², Nikolay Borissov^{3

4}, Michel Counotte^{5

6}, Nicola Low⁵, Hira Imeri⁵, Aziz Mert Ipekci⁵, Diana Buitrago-Garcia⁵, Leonie Heron⁵, Poorya Amini^{2

4}, Douglas Teodoro^{7

8}

Affiliations

¹ University of Applied Sciences and Arts of Western Switzerland (HES-SO), Rue de la Tambourine 17, 1227, Geneva, Switzerland. julien.knafou@hesge.ch.
² Risklick AG, Bern, Switzerland.
³ University of Applied Sciences and Arts of Western Switzerland (HES-SO), Rue de la Tambourine 17, 1227, Geneva, Switzerland.
⁴ CTU Bern, University of Bern, Bern, Switzerland.
⁵ Institute of Social and Preventive Medicine, University of Bern, Bern, Switzerland.
⁶ Wageningen Bioveterinary Research, Wageningen University & Research, Wageningen, The Netherlands.
⁷ University of Applied Sciences and Arts of Western Switzerland (HES-SO), Rue de la Tambourine 17, 1227, Geneva, Switzerland. douglas.teodoro@unige.ch.
⁸ Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland. douglas.teodoro@unige.ch.

PMID: 37277872
PMCID: PMC10240481
DOI: 10.1186/s13643-023-02247-9

Ensemble of deep learning language models to support the creation of living systematic reviews for the COVID-19 literature

Julien Knafou et al. Syst Rev. 2023.

. 2023 Jun 5;12(1):94.

doi: 10.1186/s13643-023-02247-9.

Authors

Affiliations

¹ University of Applied Sciences and Arts of Western Switzerland (HES-SO), Rue de la Tambourine 17, 1227, Geneva, Switzerland. julien.knafou@hesge.ch.
² Risklick AG, Bern, Switzerland.
³ University of Applied Sciences and Arts of Western Switzerland (HES-SO), Rue de la Tambourine 17, 1227, Geneva, Switzerland.
⁴ CTU Bern, University of Bern, Bern, Switzerland.
⁵ Institute of Social and Preventive Medicine, University of Bern, Bern, Switzerland.
⁶ Wageningen Bioveterinary Research, Wageningen University & Research, Wageningen, The Netherlands.
⁷ University of Applied Sciences and Arts of Western Switzerland (HES-SO), Rue de la Tambourine 17, 1227, Geneva, Switzerland. douglas.teodoro@unige.ch.
⁸ Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland. douglas.teodoro@unige.ch.

PMID: 37277872
PMCID: PMC10240481
DOI: 10.1186/s13643-023-02247-9

Abstract

Background: The COVID-19 pandemic has led to an unprecedented amount of scientific publications, growing at a pace never seen before. Multiple living systematic reviews have been developed to assist professionals with up-to-date and trustworthy health information, but it is increasingly challenging for systematic reviewers to keep up with the evidence in electronic databases. We aimed to investigate deep learning-based machine learning algorithms to classify COVID-19-related publications to help scale up the epidemiological curation process.

Methods: In this retrospective study, five different pre-trained deep learning-based language models were fine-tuned on a dataset of 6365 publications manually classified into two classes, three subclasses, and 22 sub-subclasses relevant for epidemiological triage purposes. In a k-fold cross-validation setting, each standalone model was assessed on a classification task and compared against an ensemble, which takes the standalone model predictions as input and uses different strategies to infer the optimal article class. A ranking task was also considered, in which the model outputs a ranked list of sub-subclasses associated with the article.

Results: The ensemble model significantly outperformed the standalone classifiers, achieving a F1-score of 89.2 at the class level of the classification task. The difference between the standalone and ensemble models increases at the sub-subclass level, where the ensemble reaches a micro F1-score of 70% against 67% for the best-performing standalone model. For the ranking task, the ensemble obtained the highest recall@3, with a performance of 89%. Using an unanimity voting rule, the ensemble can provide predictions with higher confidence on a subset of the data, achieving detection of original papers with a F1-score up to 97% on a subset of 80% of the collection instead of 93% on the whole dataset.

Conclusion: This study shows the potential of using deep learning language models to perform triage of COVID-19 references efficiently and support epidemiological curation and review. The ensemble consistently and significantly outperforms any standalone model. Fine-tuning the voting strategy thresholds is an interesting alternative to annotate a subset with higher predictive confidence.

Keywords: COVID-19; Deep learning; Language model; Literature screening; Living systematic review; Text classification; Transfer learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

**Fig. 1**
Overview of the study design. All articles were manually annotated and then the title, abstract, and source retrieved. In a k-fold cross-validation setting (k is set to 5 in our experiments), 5 models were fine-tuned, and each standalone model was compared against each other as well as against two types of ensemble

**Fig. 2**
Publication classifier workflow. The model starts with the title, abstract, and source fields and concatenates their text contents before tokenizing it. Each model computes their predictions, and an ensemble strategy, voting or probability sum, combines them to get a final prediction

**Fig. 3**
A Precision/recall curves of the ORIGINAL class for the RoBERTa base/large and the ensemble. B Precision/recall curves obtained by the ensemble model for the sub-subclasses. Well-represented sub-subclasses usually perform better than underrepresented ones

**Fig. 4**
Confusion matrix for class (A), subclass (B), and sub-subclass (C). The ensemble has a higher probability of confusing sub-subclasses inside their nested subclasses and classes which is why performances tend to be higher at those higher levels

**Fig. 5**
F1-score (A)/precision (B)/recall (C) for the ORIGINAL class with respect to a probability threshold per vote when using the voting strategy across the predictions on the class level. Using different thresholds improves considerably performance while reducing the number of predicted publications

**Fig. 6**
A, B, and C Top 20 positive impact words for either EPI (A), BASIC (B), or OTHER (C) subclasses when taking the integrated gradient on a never-seen set of about 600 documents. D, E, and F Classification examples with a focus on passages with impact word scores

See this image and copyright information in PMC

References

1. Chen Q, Allot A, Lu Z. LitCovid: an open database of COVID-19 literature. Nucleic Acids Res. 2021;49(D1):D1534–D1540. doi: 10.1093/nar/gkaa952. - DOI - PMC - PubMed
1. Ipekci AM, Buitrago-Garcia D, Meili KW, Krauer F, Prajapati N, Thapa S, et al. Outbreaks of publications about emerging infectious diseases: the case of SARS-CoV-2 and Zika virus. BMC Med Res Methodol. 2021;50–50. - PMC - PubMed
1. Lu Wang L, Lo K, Chandrasekhar Y, Reas R, Yang J, Eide D, et al. CORD-19: the Covid-19 Open Research Dataset. 2020 Available from: https://search.bvsalud.org/global-literature-on-novel-coronavirus-2019-n.... [Cited 29 Jun 2022].
1. Counotte M, Imeri H, Leonie H, Ipekci M, Low N. Living evidence on COVID-19. 2020 Available from: https://ispmbern.github.io/covid-19/living-review/. [Cited 29 Jun 2022].
1. The COVID-NMA initiative. Available from: https://covid-nma.com/. [Cited 29 Jun 2022].

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

404896/CIHR/Canada

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Ensemble of deep learning language models to support the creation of living systematic reviews for the COVID-19 literature

Affiliations

Ensemble of deep learning language models to support the creation of living systematic reviews for the COVID-19 literature

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical