Learning few-shot imitation as cultural transmission

Avishkar Bhoopchand¹, Bethanie Brownfield¹, Adrian Collister¹, Agustin Dal Lago¹, Ashley Edwards¹, Richard Everett¹, Alexandre Fréchette¹, Yanko Gitahy Oliveira¹, Edward Hughes², Kory W Mathewson¹, Piermaria Mendolicchio¹, Julia Pawar¹, Miruna Pȋslar¹, Alex Platonov¹, Evan Senter¹, Sukhdeep Singh¹, Alexander Zacherl¹, Lei M Zhang¹

Affiliations

¹ Google DeepMind, 6-8 Handyside Street, London, N1C 4UZ, UK.
² Google DeepMind, 6-8 Handyside Street, London, N1C 4UZ, UK. edwardhughes@google.com.

PMID: 38016945
PMCID: PMC10684502
DOI: 10.1038/s41467-023-42875-2

Learning few-shot imitation as cultural transmission

Avishkar Bhoopchand et al. Nat Commun. 2023.

. 2023 Nov 28;14(1):7536.

doi: 10.1038/s41467-023-42875-2.

Authors

Affiliations

¹ Google DeepMind, 6-8 Handyside Street, London, N1C 4UZ, UK.
² Google DeepMind, 6-8 Handyside Street, London, N1C 4UZ, UK. edwardhughes@google.com.

PMID: 38016945
PMCID: PMC10684502
DOI: 10.1038/s41467-023-42875-2

Abstract

Cultural transmission is the domain-general social skill that allows agents to acquire and use information from each other in real-time with high fidelity and recall. It can be thought of as the process that perpetuates fit variants in cultural evolution. In humans, cultural evolution has led to the accumulation and refinement of skills, tools and knowledge across generations. We provide a method for generating cultural transmission in artificially intelligent agents, in the form of few-shot imitation. Our agents succeed at real-time imitation of a human in novel contexts without using any pre-collected human data. We identify a surprisingly simple set of ingredients sufficient for generating cultural transmission and develop an evaluation methodology for rigorously assessing it. This paves the way for cultural evolution to play an algorithmic role in the development of artificial general intelligence.

PubMed Disclaimer

Conflict of interest statement

A patent has been registered covering aspects of this work. Details of the patent are as follows. Patent applicant: DeepMind Technologies Limited. Name of inventors: Bhoopchand, Avishkar Ajay; Collister, Adrian Ashley; Edwards, Ashley Deloris; Everett, Richard; Hughes, Edward Fauchon; Mathewson, Kory Wallace; Pȋslar, Miruna; Zacherl, Alexander; Zhang, Lei. Application number: PCT/EP2023/055474. Status of application: pending. The specific aspect of manuscript covered in patent application: the specific method for generating cultural transmission represented in Fig. 9, instantiated as described in the Methods section and the Supplementary Information. The authors declare no other competing interests.

Figures

**Fig. 1. GoalCycle3D.**
A 3D physical simulated task space.Each task contains procedurally generated terrain, obstacles, and goal spheres, with parameters randomly sampled on task creation. Each agent is independently rewarded for visiting goals in a particular cyclic order, also randomly sampled on task creation. The correct order is not provided to the agent, so an agent must deduce the rewarding order either by experimentation or via cultural transmission from an expert. Our task space presents navigational challenges of open-ended complexity, parameterised by world size, obstacle density, terrain bumpiness and a number of goals. Our agent observes the world using LIDAR (see Supplementary Movie 30).

**Fig. 2. Training without ADR.**
Training cultural transmission (left) and agent score (right) for training without ADR on 4-goal in a small empty world. Colours indicate four distinct phases of agent behaviour from left to right: (1) (red) startup and exploration, (2) (blue) learning to follow, (3) (yellow) learning to remember, (4) (purple) becoming independent from expert.

**Fig. 3. Ablations of MEDAL ingredients.**
Score (left), training cultural transmission (CT, centre), and evaluation CT on empty world 5-goal probe tasks (right) over the course of training. a Comparing MEDAL with three ablated agents, each trained without one crucial ingredient: without an expert (M―), memory (–EDAL), or attention loss (MED—). b Ablating the effect of expert dropout, comparing no dropout (ME—AL) with expert dropout (MEDAL). We report the mean performance for each across 10 initialisation seeds for agent parameters and task procedural generation. We also include the expert’s score and MEDAL’s best seed for scale and upper-bound comparisons. The shaded area on the graphs is one standard deviation.

**Fig. 4. Analysis of ADR parameter expansion and ablation of ADR ingredients.**
a The expansion of parameter ranges over training for one representative seed in MEDAL-ADR training. b Score (left), training Cultural Transmission (CT, centre), and evaluation CT on complex world probe tasks (right) over the course of training for the automatic (A) and domain randomisation (DR) ablations of MEDAL-ADR. We report the mean performance for each across 10 initialisation seeds for agent parameters and task procedural generation. We also include the expert’s score and the best MEDAL-ADR seed for scale and upper bound comparisons. The shaded area on the graphs is one standard deviation.

**Fig. 5. Agent recall.**
Score of MEDAL-ADR and ME-AL agents across trials since the expert dropped out. a Experts are scripted bots. b Experts are human trajectories. Supplementary Movie 10 shows MEDAL-ADR’s recall from a bot demonstration in a 3600-step (4 trial) episode. Supplementary Movie 31 shows MEDAL-ADR’s recall from a human demonstration in an 1800-step (2 trial) episode.

**Fig. 6. Evidence of causality.**
Trajectory plots for MEDAL-ADR agent for a single episode. a The bot is absent for the whole episode. b The bot shows a correct trajectory in the first half of the episode and then drops out. c The bot shows an incorrect trajectory in the first half of the episode and then drops out. The coloured parts of the lines correspond to the colour of the goal sphere the agent and expert have entered and the × s correspond to when the agent entered the incorrect goal. Here, position refers to the agent’s position along the z-axis. Supplementary Movies 11–13 correspond to each plot respectively.

**Fig. 7. Task space generalisation.**
a A slice through the world space allows us to disentangle MEDAL-ADR’s generalisation capability across different world space parameters. b MEDAL-ADR generalises across the game space, demonstrating remembering capability both inside and outside the training distribution. We report the mean performance across 50 initialisation seeds for a and 20 initialisation seeds for b. The error bars on the graphs represent 95% confidence intervals. Supplementary Movies 14–20 demonstrate generalisation over the world space and game space.

**Fig. 8. Introspection of Agent’s Brain.**
a Activations for MEDAL-ADR’s social neuron. b We report the accuracy of three linear probing models trained to predict the expert’s presence based on the belief states of three agents (MED—, MEDAL, and MEDAL-ADR). We make two causal interventions (in green and purple) and a control check (in red) on the original test set (yellow). We report the mean performance across 10 different initialisation seeds. The small standard deviation error bars suggest a broad consensus across the 10 runs on which neurons encode social information. c Spikes in the goal neuron’s activations correlate with the time the agent remains inside a goal (illustrated by coloured shading). The goal neuron was identified using a variance analysis, rather than the linear probing method in b.

**Fig. 9. Ingredients of MEDAL-ADR.**
The minimal sufficient ingredients that comprise our methods, grouped by the timescale on which they operate.

**Fig. 10. Worlds and games used as probe tasks.**
a Empty world, 4-goal games. b Empty world, 5-goal games. c Complex world, 4/5-goal games. These cover a representative range of crossings and colour combinations. The empty world probe tasks have terrain of size 20 × 20 m², while the complex world probe tasks have terrain of size 32 × 32 m². The complex world probes require clear examples of jumping behaviours and navigation around vertical obstacles. The human movement pattern in all probes is always goal-directed and near-optimal, but clearly different from a scripted bot, taking some time to get situated in the first few seconds and not taking an identical path on repeated cycles, for instance. See Supplementary Movies 21–29.

See this image and copyright information in PMC

References

1. Chollet, F. On the measure of intelligence. arXivhttps://arxiv.org/abs/1911.01547 (2019).
1. Heyes, C. Cognitive gadgets: the cultural evolution of thinking (Harvard University Press, 2018).
1. Marcus, G. The next decade in AI: four steps towards robust artificial intelligence. arXivhttps://arxiv.org/abs/2002.06177 (2020).
1. Blackmore, S. The meme machine, vol. 25 (Oxford Paperbacks, 2000).
1. Henrich, J. The secret of our success (Princeton University Press, 2015).

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Learning few-shot imitation as cultural transmission

Affiliations

Learning few-shot imitation as cultural transmission

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources