Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Nov 29;13(11):e0208095.
doi: 10.1371/journal.pone.0208095. eCollection 2018.

Emergence of linguistic conventions in multi-agent reinforcement learning

Affiliations

Emergence of linguistic conventions in multi-agent reinforcement learning

Dorota Lipowska et al. PLoS One. .

Abstract

Recently, emergence of signaling conventions, among which language is a prime example, draws a considerable interdisciplinary interest ranging from game theory, to robotics to evolutionary linguistics. Such a wide spectrum of research is based on much different assumptions and methodologies, but complexity of the problem precludes formulation of a unifying and commonly accepted explanation. We examine formation of signaling conventions in a framework of a multi-agent reinforcement learning model. When the network of interactions between agents is a complete graph or a sufficiently dense random graph, a global consensus is typically reached with the emerging language being a nearly unique object-word mapping or containing some synonyms and homonyms. On finite-dimensional lattices, the model gets trapped in disordered configurations with a local consensus only. Such a trapping can be avoided by introducing a population renewal, which in the presence of superlinear reinforcement restores an ordinary surface-tension driven coarsening and considerably enhances formation of efficient signaling.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. An elementary step of a single-object version of the model (Nw = 3).
Using the probabilities defined in Eq (1), the speaker selects one of its words (here: W2). Next both the speaker and the hearer increase their weights of the selected word by 1.
Fig 2
Fig 2. An elementary step of a multi-object version of the model (Nw = 3, No = 2).
With a uniform probability 1/No, the speaker chooses an object (the corresponding section of the inventory is encircled by the dotted line). Using the relevant weights (in solid circles), the speaker calculates the probabilities defined in Eq (2) and selects one of its words (here: W1). Next the hearer tries to guess the object the speaker is talking about by calculating the probabilites (3) based on its weights of the communicated word (in circles). When the hearer’s guess is correct, both agents increase their corresponding weights by 1.
Fig 3
Fig 3. Spatial distribution of s1, the probability that an agent will select word W1 (Eq (1)).
Results for a single-object model on a square lattice with N = 102 ⋅ 102 = 104, α = 2, Nw = 2. The dynamics traps the model in a disordered state (the configurations for, e.g., t = 103 and t = 104 differ only slightly). Since s1 is generally close to unity or to zero, it means that almost every agent developed a strong preference toward one of the words.
Fig 4
Fig 4. Time dependence of mL.
Results for a single-object model with Nw = 2 on complete graphs (N = 105) and Cartesian lattices with d = 1 (N = 105), d = 2 (N = 3002 = 9 ⋅ 104), and d = 3 (N = 503 = 125 ⋅ 103). In the case of the ordinary reinforcement (α = 1), none of the words is even locally preferred on a complete graph (since mL → 0), and only a small asymmetry is seen for a square lattice. The results presented (also in the following figures) are averages over 20 independent runs. Statistical errors are typically smaller than plotting symbols and are omitted.
Fig 5
Fig 5. Time dependence of mG.
Results for a complete graph and Cartesian lattices with the same simulation parameters as in Fig 4. Only for a complete graph and α = 2, a global symmetry gets broken and one word dominates in the entire population of agents.
Fig 6
Fig 6. Time dependence of mG (random graphs).
Results for random graphs of an average node degree z and a complete graph (α = 2, Nw = 2, N = 105). Only for sufficiently large z, the behavior on the random graphs is similar to that on the complete graph. Averaging over 20 runs includes generation of independent graphs.
Fig 7
Fig 7. Distribution of the dominant words that agents use to talk about the first object.
Left: simulations on a square lattice with No = 2, Nw = 10, N = 50 ⋅ 50 = 2.5 ⋅ 103, α = 2. Right: the same simulations but with a population renewal (with probability p = 10−5).
Fig 8
Fig 8. Time dependence of the success rate.
Results for the model on the complete graph of size N = 104 with No = 10, several values of Nw, and α = 2. For Nw = 50 and α = 1 (yellow line), we can see a much slower convergence to a consensus than for Nw = 50 and α = 2. The black line shows the success rate for the version with a population renewal (with probability p = 10−4).
Fig 9
Fig 9. Distribution of the total (normalized) weights associated with particular words.
Results for simulations with No = 10 on the complete graph of size N = 104 and simulation time t = 106 (α = 2). Simulations for t = 105 lead to nearly identical distributions.
Fig 10
Fig 10. Spatial distribution of s1, the probability that an agent will select word W1 (Eq (1)).
Results for a single-object model on a square lattice with N = 102 ⋅ 102 = 104, α = 2, Nw = 2, and with the renewal probability p = 10−4. In this case, contrary to Fig 3, clusters of agents with the same dominant word grow steadily.
Fig 11
Fig 11. Time dependence of 1 − mL for several values of the renewal probability p.
Results for a single-object, square-lattice version of our model. Simulations were made for α = 2, Nw = 2, and N = 200 ⋅ 200 = 4 ⋅ 104, and the results are averages of 20 independent runs. The line segment has a slope corresponding to t−0.41.
Fig 12
Fig 12. Time dependence of 1 − mL for several values of α.
Results for a single-object square-lattice version of our model for the renewal probability p = 10−3. Simulations were made for Nw = 2 and N = 200 ⋅ 200 = 4 ⋅ 104, and the results are averages of 20 independent runs.

References

    1. Lewis D. Convention: A philosophical study. Oxford, UK: Blackwell; 2002.
    1. Nowak M, Krakauer D, Kingdom U. The evolution of language. Proceedings of the National Academy of Sciences. 1999; 96(July): 8028–8033. 10.1073/pnas.96.14.8028 - DOI - PMC - PubMed
    1. Nowak M, Komarova N. Towards an evolutionary theory of language. Trends in Cognitive Sciences. 2001; 5(7): 288–295. 10.1016/S1364-6613(00)01683-1 - DOI - PubMed
    1. Oliphant M. The dilemma of Saussurean communication. BioSystems. 1996; 37(1-2): 31–38. 10.1016/0303-2647(95)01543-4 - DOI - PubMed
    1. Barr DJ. Establishing conventional communication systems: Is common knowledge necessary? Cognitive Science. 2004; 28(6): 937–962. 10.1207/s15516709cog2806_3 - DOI