Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 May 25;118(21):e2017015118.
doi: 10.1073/pnas.2017015118.

Continuous learning of emergent behavior in robotic matter

Affiliations

Continuous learning of emergent behavior in robotic matter

Giorgio Oliveri et al. Proc Natl Acad Sci U S A. .

Abstract

One of the main challenges in robotics is the development of systems that can adapt to their environment and achieve autonomous behavior. Current approaches typically aim to achieve this by increasing the complexity of the centralized controller by, e.g., direct modeling of their behavior, or implementing machine learning. In contrast, we simplify the controller using a decentralized and modular approach, with the aim of finding specific requirements needed for a robust and scalable learning strategy in robots. To achieve this, we conducted experiments and simulations on a specific robotic platform assembled from identical autonomous units that continuously sense their environment and react to it. By letting each unit adapt its behavior independently using a basic Monte Carlo scheme, the assembled system is able to learn and maintain optimal behavior in a dynamic environment as long as its memory is representative of the current environment, even when incurring damage. We show that the physical connection between the units is enough to achieve learning, and no additional communication or centralized information is required. As a result, such a distributed learning approach can be easily scaled to larger assemblies, blurring the boundaries between materials and robots, paving the way for a new class of modular "robotic matter" that can autonomously learn to thrive in dynamic or unfamiliar situations, for example, encountered by soft robots or self-assembled (micro)robots in various environments spanning from the medical realm to space explorations.

Keywords: dynamic environment; emergent behavior; modular robot; reinforced learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interest.

Figures

Fig. 1.
Fig. 1.
Robotic unit design and initial learning experiments for an assembled robot consisting of two active units. (A) Design of a single robotic unit in its relaxed and extended state. (B) Robot assembled from two active units and one dummy unit. (C) Each active unit can vary its phase ϕitcycle, which affects the actuation timing as shown by the cyclic pressure in the soft actuators. (D and E) Evolution of phases ϕi and measured velocity Ui in the active units as a function of time for a single learning experiment. (F) Distribution of average robot velocity Ū as a function of the learning step for 112 learning simulations and 56 learning experiments. (G) Simulated and experimental velocity as function of phase difference Δϕ=ϕ2ϕ1, exemplified by three experiments (H).
Fig. 2.
Fig. 2.
Adaptability to variations in the environment of an assembled robot consisting of three active units. (A and B) Distribution of average velocities Ū and average acceptance rate p¯ for the Thermal algorithm, as a function of the learning step, for 112 simulations and 56 experiments. (C and D) Average velocity Ū obtained in simulations and experiments as a function of all of the possible combinations of phases ϕ3ϕ1 and ϕ2ϕ1 (with ϕ1=0). (E) Standard deviation of the average velocity Ū observed over 20 experimental runs. (F) Two different experiments with fixed phases, to highlight the effect of the track on the robot’s behavior. (G and H) Distribution of average velocities Ū and average acceptance rate p¯ for the Flaky algorithm as a function of the learning step, for 112 simulations and 56 experiments.
Fig. 3.
Fig. 3.
Adaptability of the assembled robot to damage using the Thermal and Flaky algorithms. (A and B) An assembled robot consisting of three active units is damaged by removing one of the needles. (C and D) Average velocity Ū obtained in experiments for an intact and damaged robot, as a function of the possible range of combinations of phases ϕ3ϕ1 and ϕ2ϕ1 (with ϕ1=0). The white and black dots represent the tried phases in two learning experiments in which the robot is damaged after 300 learning steps, using the (E) Thermal and (F) Flaky algorithms, respectively.
Fig. 4.
Fig. 4.
Scalability of the learning behavior for an assembled robot ranging from 2 to 20 active units. (A) Initial and final position after 300 learning steps (1,200 s) of an assembled robot consisting of seven active units. (B) Distribution of average velocities Ū as a function of the learning step, for 112 simulations and 56 experiments. (C) Relation between the number of active units and the equilibrium velocity Ūeq, obtained using simulations, for Δs=0.1 and Δs=0.1/4. (D) Average number of learning steps, and standard deviation, needed to reach equilibrium (i.e., 0.8Ūeq) as a function of the number of active units. Here we only consider the simulation which reach the 0.8Ūeq threshold within the assigned number of learning steps. While for Δs=0.1 all of the simulations reach the threshold, for Δs=0.1/4 we find that a percentage of simulations do not reach 0.8Ūeq (SI Appendix, Fig. S8B).

References

    1. Hwangbo J., et al. , Learning agile and dynamic motor skills for legged robots. Sci. Robot. 4, eaau5872 (2019). - PubMed
    1. Kwiatkowski R., Lipson H., Task-agnostic self-modeling machines. Sci. Robot. 4, eaau9354 (2019). - PubMed
    1. Rus D., Tolley M. T., Design, fabrication and control of soft robots. Nature 521, 467–475 (2015). - PubMed
    1. Cully A., Clune J., Tarapore D., Mouret J.-B., Robots that can adapt like animals. Nature 521, 503–507 (2015). - PubMed
    1. Haarnoja T., et al. ., Learning to walk via deep reinforcement learning. arXiv [Preprint] (2018). https://arxiv.org/abs/1812.11103v3 (Accessed 1 May 2020).

Publication types

LinkOut - more resources