. 2021 May 25;118(21):e2017015118.

doi: 10.1073/pnas.2017015118.

Continuous learning of emergent behavior in robotic matter

Giorgio Oliveri¹, Lucas C van Laake¹, Cesare Carissimo¹, Clara Miette¹, Johannes T B Overvelde²

Affiliations

¹ Designer Matter Department, AMOLF, 1098 XG Amsterdam, The Netherlands.
² Designer Matter Department, AMOLF, 1098 XG Amsterdam, The Netherlands overvelde@amolf.nl.

PMID: 33972408
PMCID: PMC8166149
DOI: 10.1073/pnas.2017015118

Continuous learning of emergent behavior in robotic matter

Giorgio Oliveri et al. Proc Natl Acad Sci U S A. 2021.

. 2021 May 25;118(21):e2017015118.

doi: 10.1073/pnas.2017015118.

Authors

Giorgio Oliveri¹, Lucas C van Laake¹, Cesare Carissimo¹, Clara Miette¹, Johannes T B Overvelde²

Affiliations

¹ Designer Matter Department, AMOLF, 1098 XG Amsterdam, The Netherlands.
² Designer Matter Department, AMOLF, 1098 XG Amsterdam, The Netherlands overvelde@amolf.nl.

PMID: 33972408
PMCID: PMC8166149
DOI: 10.1073/pnas.2017015118

Abstract

One of the main challenges in robotics is the development of systems that can adapt to their environment and achieve autonomous behavior. Current approaches typically aim to achieve this by increasing the complexity of the centralized controller by, e.g., direct modeling of their behavior, or implementing machine learning. In contrast, we simplify the controller using a decentralized and modular approach, with the aim of finding specific requirements needed for a robust and scalable learning strategy in robots. To achieve this, we conducted experiments and simulations on a specific robotic platform assembled from identical autonomous units that continuously sense their environment and react to it. By letting each unit adapt its behavior independently using a basic Monte Carlo scheme, the assembled system is able to learn and maintain optimal behavior in a dynamic environment as long as its memory is representative of the current environment, even when incurring damage. We show that the physical connection between the units is enough to achieve learning, and no additional communication or centralized information is required. As a result, such a distributed learning approach can be easily scaled to larger assemblies, blurring the boundaries between materials and robots, paving the way for a new class of modular "robotic matter" that can autonomously learn to thrive in dynamic or unfamiliar situations, for example, encountered by soft robots or self-assembled (micro)robots in various environments spanning from the medical realm to space explorations.

Keywords: dynamic environment; emergent behavior; modular robot; reinforced learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interest.

Figures

**Fig. 1.**
Robotic unit design and initial learning experiments for an assembled robot consisting of two active units. (A) Design of a single robotic unit in its relaxed and extended state. (B) Robot assembled from two active units and one dummy unit. (C) Each active unit can vary its phase $ϕ_{i} t_{cycle}$ , which affects the actuation timing as shown by the cyclic pressure in the soft actuators. (D and E) Evolution of phases $ϕ_{i}$ and measured velocity $U_{i}$ in the active units as a function of time for a single learning experiment. (F) Distribution of average robot velocity $Ū$ as a function of the learning step for 112 learning simulations and 56 learning experiments. (G) Simulated and experimental velocity as function of phase difference $Δ ϕ = ϕ_{2} - ϕ_{1}$ , exemplified by three experiments (H).

**Fig. 2.**
Adaptability to variations in the environment of an assembled robot consisting of three active units. (A and B) Distribution of average velocities $Ū$ and average acceptance rate $\bar{p}$ for the Thermal algorithm, as a function of the learning step, for 112 simulations and 56 experiments. (C and D) Average velocity $Ū$ obtained in simulations and experiments as a function of all of the possible combinations of phases $ϕ_{3} - ϕ_{1}$ and $ϕ_{2} - ϕ_{1}$ (with $ϕ_{1} = 0$ ). (E) Standard deviation of the average velocity $Ū$ observed over 20 experimental runs. (F) Two different experiments with fixed phases, to highlight the effect of the track on the robot’s behavior. (G and H) Distribution of average velocities $Ū$ and average acceptance rate $\bar{p}$ for the Flaky algorithm as a function of the learning step, for 112 simulations and 56 experiments.

**Fig. 3.**
Adaptability of the assembled robot to damage using the Thermal and Flaky algorithms. (A and B) An assembled robot consisting of three active units is damaged by removing one of the needles. (C and D) Average velocity $Ū$ obtained in experiments for an intact and damaged robot, as a function of the possible range of combinations of phases $ϕ_{3} - ϕ_{1}$ and $ϕ_{2} - ϕ_{1}$ (with $ϕ_{1} = 0$ ). The white and black dots represent the tried phases in two learning experiments in which the robot is damaged after 300 learning steps, using the (E) Thermal and (F) Flaky algorithms, respectively.

**Fig. 4.**
Scalability of the learning behavior for an assembled robot ranging from 2 to 20 active units. (A) Initial and final position after 300 learning steps (1,200 s) of an assembled robot consisting of seven active units. (B) Distribution of average velocities $Ū$ as a function of the learning step, for 112 simulations and 56 experiments. (C) Relation between the number of active units and the equilibrium velocity $Ū_{eq}$ , obtained using simulations, for $Δ s = 0.1$ and $Δ s = 0.1 / 4$ . (D) Average number of learning steps, and standard deviation, needed to reach equilibrium (i.e., $0.8 Ū_{eq}$ ) as a function of the number of active units. Here we only consider the simulation which reach the $0.8 Ū_{eq}$ threshold within the assigned number of learning steps. While for $Δ s = 0.1$ all of the simulations reach the threshold, for $Δ s = 0.1 / 4$ we find that a percentage of simulations do not reach $0.8 Ū_{eq}$ (*SI Appendix*, Fig. S8B).

See this image and copyright information in PMC

References

1. Hwangbo J., et al. , Learning agile and dynamic motor skills for legged robots. Sci. Robot. 4, eaau5872 (2019). - PubMed
1. Kwiatkowski R., Lipson H., Task-agnostic self-modeling machines. Sci. Robot. 4, eaau9354 (2019). - PubMed
1. Rus D., Tolley M. T., Design, fabrication and control of soft robots. Nature 521, 467–475 (2015). - PubMed
1. Cully A., Clune J., Tarapore D., Mouret J.-B., Robots that can adapt like animals. Nature 521, 503–507 (2015). - PubMed
1. Haarnoja T., et al. ., Learning to walk via deep reinforcement learning. arXiv [Preprint] (2018). https://arxiv.org/abs/1812.11103v3 (Accessed 1 May 2020).

Publication types

Actions

Associated data

figshare/10.6084/m9.figshare.13626110

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Continuous learning of emergent behavior in robotic matter

Affiliations

Continuous learning of emergent behavior in robotic matter

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

Associated data

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials