. 2018 Nov 29;13(11):e0208095.

doi: 10.1371/journal.pone.0208095. eCollection 2018.

Emergence of linguistic conventions in multi-agent reinforcement learning

Dorota Lipowska¹, Adam Lipowski²

Affiliations

¹ Faculty of Modern Languages and Literature, Adam Mickiewicz University, Poznań, Poland.
² Faculty of Physics, Adam Mickiewicz University, Poznań, Poland.

PMID: 30496267
PMCID: PMC6264146
DOI: 10.1371/journal.pone.0208095

Emergence of linguistic conventions in multi-agent reinforcement learning

Dorota Lipowska et al. PLoS One. 2018.

. 2018 Nov 29;13(11):e0208095.

doi: 10.1371/journal.pone.0208095. eCollection 2018.

Authors

Dorota Lipowska¹, Adam Lipowski²

Affiliations

¹ Faculty of Modern Languages and Literature, Adam Mickiewicz University, Poznań, Poland.
² Faculty of Physics, Adam Mickiewicz University, Poznań, Poland.

PMID: 30496267
PMCID: PMC6264146
DOI: 10.1371/journal.pone.0208095

Abstract

Recently, emergence of signaling conventions, among which language is a prime example, draws a considerable interdisciplinary interest ranging from game theory, to robotics to evolutionary linguistics. Such a wide spectrum of research is based on much different assumptions and methodologies, but complexity of the problem precludes formulation of a unifying and commonly accepted explanation. We examine formation of signaling conventions in a framework of a multi-agent reinforcement learning model. When the network of interactions between agents is a complete graph or a sufficiently dense random graph, a global consensus is typically reached with the emerging language being a nearly unique object-word mapping or containing some synonyms and homonyms. On finite-dimensional lattices, the model gets trapped in disordered configurations with a local consensus only. Such a trapping can be avoided by introducing a population renewal, which in the presence of superlinear reinforcement restores an ordinary surface-tension driven coarsening and considerably enhances formation of efficient signaling.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. An elementary step of a single-object version of the model (N_w = 3).**
Using the probabilities defined in Eq (1), the speaker selects one of its words (here: W₂). Next both the speaker and the hearer increase their weights of the selected word by 1.

**Fig 2. An elementary step of a multi-object version of the model (N_w = 3, N_o = 2).**
With a uniform probability 1/N_o, the speaker chooses an object (the corresponding section of the inventory is encircled by the dotted line). Using the relevant weights (in solid circles), the speaker calculates the probabilities defined in Eq (2) and selects one of its words (here: W₁). Next the hearer tries to guess the object the speaker is talking about by calculating the probabilites (3) based on its weights of the communicated word (in circles). When the hearer’s guess is correct, both agents increase their corresponding weights by 1.

**Fig 3. Spatial distribution of s₁, the probability that an agent will select word W₁ (Eq (1)).**
Results for a single-object model on a square lattice with N = 10² ⋅ 10² = 10⁴, α = 2, N_w = 2. The dynamics traps the model in a disordered state (the configurations for, e.g., t = 10³ and t = 10⁴ differ only slightly). Since s₁ is generally close to unity or to zero, it means that almost every agent developed a strong preference toward one of the words.

**Fig 4. Time dependence of m_L.**
Results for a single-object model with N_w = 2 on complete graphs (N = 10⁵) and Cartesian lattices with d = 1 (N = 10⁵), d = 2 (N = 300² = 9 ⋅ 10⁴), and d = 3 (N = 50³ = 125 ⋅ 10³). In the case of the ordinary reinforcement (α = 1), none of the words is even locally preferred on a complete graph (since m_L → 0), and only a small asymmetry is seen for a square lattice. The results presented (also in the following figures) are averages over 20 independent runs. Statistical errors are typically smaller than plotting symbols and are omitted.

**Fig 5. Time dependence of m_G.**
Results for a complete graph and Cartesian lattices with the same simulation parameters as in Fig 4. Only for a complete graph and α = 2, a global symmetry gets broken and one word dominates in the entire population of agents.

**Fig 6. Time dependence of m_G (random graphs).**
Results for random graphs of an average node degree z and a complete graph (α = 2, N_w = 2, N = 10⁵). Only for sufficiently large z, the behavior on the random graphs is similar to that on the complete graph. Averaging over 20 runs includes generation of independent graphs.

**Fig 7. Distribution of the dominant words that agents use to talk about the first object.**
*Left*: simulations on a square lattice with N_o = 2, N_w = 10, N = 50 ⋅ 50 = 2.5 ⋅ 10³, α = 2. *Right*: the same simulations but with a population renewal (with probability p = 10⁻⁵).

**Fig 8. Time dependence of the success rate.**
Results for the model on the complete graph of size N = 10⁴ with N_o = 10, several values of N_w, and α = 2. For N_w = 50 and α = 1 (yellow line), we can see a much slower convergence to a consensus than for N_w = 50 and α = 2. The black line shows the success rate for the version with a population renewal (with probability p = 10⁻⁴).

**Fig 9. Distribution of the total (normalized) weights associated with particular words.**
Results for simulations with N_o = 10 on the complete graph of size N = 10⁴ and simulation time t = 10⁶ (α = 2). Simulations for t = 10⁵ lead to nearly identical distributions.

**Fig 10. Spatial distribution of s₁, the probability that an agent will select word W₁ (Eq (1)).**
Results for a single-object model on a square lattice with N = 10² ⋅ 10² = 10⁴, α = 2, N_w = 2, and with the renewal probability p = 10⁻⁴. In this case, contrary to Fig 3, clusters of agents with the same dominant word grow steadily.

**Fig 11. Time dependence of 1 − m_L for several values of the renewal probability p.**
Results for a single-object, square-lattice version of our model. Simulations were made for α = 2, N_w = 2, and N = 200 ⋅ 200 = 4 ⋅ 10⁴, and the results are averages of 20 independent runs. The line segment has a slope corresponding to t^−0.41.

**Fig 12. Time dependence of 1 − m_L for several values of α.**
Results for a single-object square-lattice version of our model for the renewal probability p = 10⁻³. Simulations were made for N_w = 2 and N = 200 ⋅ 200 = 4 ⋅ 10⁴, and the results are averages of 20 independent runs.

See this image and copyright information in PMC

References

1. Lewis D. Convention: A philosophical study. Oxford, UK: Blackwell; 2002.
1. Nowak M, Krakauer D, Kingdom U. The evolution of language. Proceedings of the National Academy of Sciences. 1999; 96(July): 8028–8033. 10.1073/pnas.96.14.8028 - DOI - PMC - PubMed
1. Nowak M, Komarova N. Towards an evolutionary theory of language. Trends in Cognitive Sciences. 2001; 5(7): 288–295. 10.1016/S1364-6613(00)01683-1 - DOI - PubMed
1. Oliphant M. The dilemma of Saussurean communication. BioSystems. 1996; 37(1-2): 31–38. 10.1016/0303-2647(95)01543-4 - DOI - PubMed
1. Barr DJ. Establishing conventional communication systems: Is common knowledge necessary? Cognitive Science. 2004; 28(6): 937–962. 10.1207/s15516709cog2806_3 - DOI

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Emergence of linguistic conventions in multi-agent reinforcement learning

Affiliations

Emergence of linguistic conventions in multi-agent reinforcement learning

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources