Survey on evaluation methods for dialogue systems
- PMID: 33505103
- PMCID: PMC7817575
- DOI: 10.1007/s10462-020-09866-x
Survey on evaluation methods for dialogue systems
Abstract
In this paper, we survey the methods and concepts developed for the evaluation of dialogue systems. Evaluation, in and of itself, is a crucial part during the development process. Often, dialogue systems are evaluated by means of human evaluations and questionnaires. However, this tends to be very cost- and time-intensive. Thus, much work has been put into finding methods which allow a reduction in involvement of human labour. In this survey, we present the main concepts and methods. For this, we differentiate between the various classes of dialogue systems (task-oriented, conversational, and question-answering dialogue systems). We cover each class by introducing the main technologies developed for the dialogue systems and then present the evaluation methods regarding that class.
Keywords: Chatbots; Conversational AI; Dialogue systems; Discourse model; Evaluation metrics.
© The Author(s) 2020.
Conflict of interest statement
Conflict of interestThere are no conflicts of interest to disclose.
Figures







References
-
- Adiwardana D, Luong MT, So DR, Hall J, Fiedel N, Thoppilan R, Yang Z, Kulshreshtha A, Nemade G, Lu Y, et al. (2020) Towards a human-like open-domain chatbot. arXiv preprint arXiv:200109977
-
- Ameixa D, Coheur L (2013) From subtitles to human interactions: introducing the SubTle Corpus. In: Technical report 2013
-
- Austin JL. How to do things with words. William James: Oxford University Press, Oxford; 1962.
-
- Banchs RE (2012) Movie-DiC: a Movie Dialogue Corpus for Research and Development. In: Proceedings of the 50th annual meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Association for Computational Linguistics, pp 203–207
-
- Banchs RE, Li H (2012) IRIS: a chat-oriented dialogue system based on the vector space model. In: Proceedings of the ACL 2012 demonstrations, Jeju Island, Korea, pp 37–42
LinkOut - more resources
Full Text Sources
Other Literature Sources