Survey on evaluation methods for dialogue systems

Jan Deriu¹, Alvaro Rodrigo², Arantxa Otegi³, Guillermo Echegoyen², Sophie Rosset⁴, Eneko Agirre³, Mark Cieliebak¹

Affiliations

¹ Zurich University of Applied Sciences (ZHAW), Steinberggasse 13, 8400 Winterthur, Switzerland.
² NLP & IRGroup, UNED, C/Juan del Rosal 16, 28040 Madrid, Spain.
³ IXA NLP Group, University of the Basque Country(UPV/EHU), Manuel Lardizabal 1 Donostia, 20018 Basque Country, Spain.
⁴ CNRS, LIMSI, Université Paris-Saclay, Campus Universitaire, Bât. 508, rue John von Neumann, 91405 Orsay Cedex, France.

PMID: 33505103
PMCID: PMC7817575
DOI: 10.1007/s10462-020-09866-x

Survey on evaluation methods for dialogue systems

Jan Deriu et al. Artif Intell Rev. 2021.

. 2021;54(1):755-810.

doi: 10.1007/s10462-020-09866-x. Epub 2020 Jun 25.

Authors

Jan Deriu¹, Alvaro Rodrigo², Arantxa Otegi³, Guillermo Echegoyen², Sophie Rosset⁴, Eneko Agirre³, Mark Cieliebak¹

Affiliations

¹ Zurich University of Applied Sciences (ZHAW), Steinberggasse 13, 8400 Winterthur, Switzerland.
² NLP & IRGroup, UNED, C/Juan del Rosal 16, 28040 Madrid, Spain.
³ IXA NLP Group, University of the Basque Country(UPV/EHU), Manuel Lardizabal 1 Donostia, 20018 Basque Country, Spain.
⁴ CNRS, LIMSI, Université Paris-Saclay, Campus Universitaire, Bât. 508, rue John von Neumann, 91405 Orsay Cedex, France.

PMID: 33505103
PMCID: PMC7817575
DOI: 10.1007/s10462-020-09866-x

Abstract

In this paper, we survey the methods and concepts developed for the evaluation of dialogue systems. Evaluation, in and of itself, is a crucial part during the development process. Often, dialogue systems are evaluated by means of human evaluations and questionnaires. However, this tends to be very cost- and time-intensive. Thus, much work has been put into finding methods which allow a reduction in involvement of human labour. In this survey, we present the main concepts and methods. For this, we differentiate between the various classes of dialogue systems (task-oriented, conversational, and question-answering dialogue systems). We cover each class by introducing the main technologies developed for the dialogue systems and then present the evaluation methods regarding that class.

Keywords: Chatbots; Conversational AI; Dialogue systems; Discourse model; Evaluation metrics.

PubMed Disclaimer

Conflict of interest statement

Conflict of interestThere are no conflicts of interest to disclose.

Figures

**Fig. 1**
Example dialogue where the driver can query the agenda via a voice command (Eric et al. 2017). The dialogue system guides the driver through the various options

**Fig. 2**
General overview of a task-oriented dialogue system

**Fig. 3**
Overview of a DST module. The input to the DST module is the combined output of the ASR and the NLU model

**Fig. 4**
Examples of goals from Schatzmann et al. (2007) and Walker et al. (1997). Where $C_{0}$ denotes the information constraints, i.e. which information is to be retrieved (a bar that serves beer in the city center). $R_{0}$ denotes the set of requests, i.e. the information the user wants (name, address, and phone number)

**Fig. 5**
PARADISE overview (Schmitt and Ultes 2015)

**Fig. 6**
Overview of the interaction quality procedure (Schmitt and Ultes 2015)

**Fig. 7**
Overview of the HRED architecture. There are two levels of encoding: (i) the utterance encoder, which encodes a single utterance and (ii) the context encoder, which encodes the sequence of utterance encodings. The decoder is conditioned on the context encoding

See this image and copyright information in PMC

References

1. Adiwardana D, Luong MT, So DR, Hall J, Fiedel N, Thoppilan R, Yang Z, Kulshreshtha A, Nemade G, Lu Y, et al. (2020) Towards a human-like open-domain chatbot. arXiv preprint arXiv:200109977
1. Ameixa D, Coheur L (2013) From subtitles to human interactions: introducing the SubTle Corpus. In: Technical report 2013
1. Austin JL. How to do things with words. William James: Oxford University Press, Oxford; 1962.
1. Banchs RE (2012) Movie-DiC: a Movie Dialogue Corpus for Research and Development. In: Proceedings of the 50th annual meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Association for Computational Linguistics, pp 203–207
1. Banchs RE, Li H (2012) IRIS: a chat-oriented dialogue system based on the vector space model. In: Proceedings of the ACL 2012 demonstrations, Jeju Island, Korea, pp 37–42

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Survey on evaluation methods for dialogue systems

Affiliations

Survey on evaluation methods for dialogue systems

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources

Other Literature Sources