Chemotactic navigation in robotic swimmers via reset-free hierarchical reinforcement learning

doi:10.1038/s41467-025-60646-z

. 2025 Jul 1;16(1):5441.

doi: 10.1038/s41467-025-60646-z.

Chemotactic navigation in robotic swimmers via reset-free hierarchical reinforcement learning

Tongzhao Xiong¹, Zhaorong Liu^{1

2}, Yufei Wang¹, Chong Jin Ong¹, Lailai Zhu³

Affiliations

¹ Department of Mechanical Engineering, National University of Singapore, Singapore, 117575, Singapore.
² Department of Physics, University of Science and Technology of China, Hefei, Anhui, 230026, People's Republic of China.
³ Department of Mechanical Engineering, National University of Singapore, Singapore, 117575, Singapore. lailai_zhu@nus.edu.sg.

PMID: 40593554
PMCID: PMC12215520
DOI: 10.1038/s41467-025-60646-z

Chemotactic navigation in robotic swimmers via reset-free hierarchical reinforcement learning

Tongzhao Xiong et al. Nat Commun. 2025.

. 2025 Jul 1;16(1):5441.

doi: 10.1038/s41467-025-60646-z.

Authors

Tongzhao Xiong¹, Zhaorong Liu^{1

2}, Yufei Wang¹, Chong Jin Ong¹, Lailai Zhu³

Affiliations

¹ Department of Mechanical Engineering, National University of Singapore, Singapore, 117575, Singapore.
² Department of Physics, University of Science and Technology of China, Hefei, Anhui, 230026, People's Republic of China.
³ Department of Mechanical Engineering, National University of Singapore, Singapore, 117575, Singapore. lailai_zhu@nus.edu.sg.

PMID: 40593554
PMCID: PMC12215520
DOI: 10.1038/s41467-025-60646-z

Abstract

Microorganisms have evolved diverse strategies to propel themselves in viscous fluids, navigate complex environments, and exhibit taxis in response to stimuli. This has inspired the development of miniature robots, where artificial intelligence (AI) is playing an increasingly important role. Can AI endow these synthetic systems with intelligence akin to that honed through natural evolution? Here, we demonstrate, in silico, chemotactic navigation in a multi-link robotic model using two-level hierarchical reinforcement learning (RL). The lower-level RL allows the model-configured as a chain or ring topology-to acquire topology-adapted swimming gaits: wave propagation characteristic of flagella or body oscillation akin to an amoebae. Such chain and ring swimmers, further enabled by the higher-level RL, accomplish chemotactic navigation in prototypical biologically relevant scenarios that feature conflicting chemoattractants, pursuing a swimming bacterial mimic, steering in vortical flows, and squeezing through tight constrictions. Additionally, we achieve reset-free RL under partial observability, where simulated robots rely solely on local scalar observations rather than global or vectorial data. This advancement illuminates potential solutions for overcoming persistent challenges of manual resets and partial observability in real-world microrobotic RL.

PubMed Disclaimer

Conflict of interest statement

Inclusion an ethics statement: All collaborators of this work have fulfilled the criteria for authorship required by Nature Portfolio journals and have been included as authors, as their participation was important for the design and implementation of the study. Roles and responsibilities were agreed upon among collaborators ahead of the research. This work includes findings that are locally and globally relevant. The research was not restricted or prohibited in the setting of the researchers and does not result in stigmatization, incrimination, discrimination or personal risk to participants. Local and regional research relevant to our study was taken into account in citations. Competing interests: The authors declare no competing interests.

Figures

**Fig. 1. Two-level hierarchical RL framework for in silico chemotactic navigation of multi-link robotic models in viscous fluids.**
The model adopts either a chain (a) or ring (b) topology. c Workflow of the lower-level RL: training simulated robots to acquire primitive swimming skills—propulsion and reorientation. d Workflow of the higher-level RL: training robots to steer up chemoattractant gradients.

**Fig. 2. Learning primitive swimming skills: propulsion and reorientation.**
a A chain swimmer achieves propulsion through symmetric beating, with vectors indicating the flow field. The inset shows the horizontal (x) and vertical (y) displacements of the swimmer’s tip, resembling a traveling transverse wave. b The chain swimmer reorients through asymmetric actuation. c A ring swimmer propels by periodically contracting and expanding its body. The inset displays the x and y displacements of the swimmer’s front, typical of a longitudinal wave propagating horizontally. d Evolution of front-rear pressure differences during the policy iteration process: training the chain swimmer for propulsion (purple solid line) and reorientation (purple dashed), and the ring model for propulsion (green solid in the inset). This quantity is time-averaged over one policy. e, f Similar to d, but for the rotational rate and centroid translational velocity of the swimmers, respectively. The latter reaches a plateau value (dashed line), $U / (\hat{Ω} L)$ , representing the swimming speed of simulated agents once training has converged. Source data are provided as a Source Data file.

**Fig. 3. Chemotaxis towards a stationary chemical source with and without chemical disturbances.**
a A chain swimmer initially at (0, 0) swims toward the target source at (4, 0). b Similar to a, but with a disturbing chemical source (star symbol) at (0, 4) featuring strengths of Q_d = 0.2Q (left panel) and 0.6Q (right panel), respectively. Here, ΔC signifies the difference in the chemoattractant concentration produced by the target and disturbance sources. c Akin to a, but for a ring model. d Analogous to b, but for a ring swimmer, with the disturbing source strengths of Q_d = 0.6Q (left panel) and 0.8Q (right panel), respectively. Source data is provided as a Source Data file.

**Fig. 4. Simulated robotic swimmers pursue a dynamic chemical source of strength Q.**
a–c A chain swimmer chasing (a), encountering (b), and following (c) a chemical source (symbolized by a star) moving along a circular orbit. d–f Similar to a–c, but for a ring swimmer and a source executing a figure-eight-shaped path. g Trajectories of chain (left column) and ring (right column) swimmers pursuing a source at varying speeds. Source data are provided as a Source Data file.

**Fig. 5. Chemotactic navigation in periodic cellular vortices with a vortical velocity U_v.**
Swimmers start from (0, 0) with the goal of reaching a chemical source positioned at (4, 4). a, b A chain model swims in vortices with $U_{v} = 0.005 \hat{Ω} L$ and $U_{v} = 0.03 \hat{Ω} L$ , respectively. The background color denotes the dimensionless flow vorticity. The inset of A depicts the typical streamlines. c Statistics of successful chemotaxis versus the vortical strength U_v, where crosses and circles signify failures and successes, respectively. d, e similar to a and b, but for a ring swimmer, with $U_{v} = 0.0005 \hat{Ω} L$ and $U_{v} = 0.005 \hat{Ω} L$ , respectively. f Similar to c, while for a ring swimmer. Source data is provided as a Source Data file.

**Fig. 6. Chemotactic navigation through a narrow constriction.**
a The swimmer navigates a constriction of height d to reach the chemical source located on the opposite side. The source and locomotion are within the same xy-plane. A chain swimmer successfully traverses a constriction of d = 0.3L (b), but fails to pass through a narrower one of d = 0.2L (c). d Similar to b, but for a ring swimmer, which squeezes through the narrow gap of d = 0.5L/π, where L/π denotes the effective diameter of ring models. e analogous to d, but with a tighter constriction of d = 0.4L/π, which obstructs the swimmer. Source data is provided as a Source Data file.

See this image and copyright information in PMC

References

1. Fraenkel, G. S. & Gunn, D. L. The Orientation of Animals, Kineses, Taxes and Compass Reactions. (Dover, 1961).
1. Jékely, G. Evolution of phototaxis. Philos. Trans. R. Soc. B364, 2795–2808 (2009). - PMC - PubMed
1. Poff, K. L. & Skokut, M. Thermotaxis by pseudoplasmodia of Dictyostelium discoideum. Proc. Natl Acad. Sci. USA74, 2007–2010 (1977). - PMC - PubMed
1. Adler, J. Chemotaxis in bacteria: motile Escherichia coli migrate in bands that are influenced by oxygen and organic nutrients. Science153, 708–716 (1966). - PubMed
1. Berg, H. C. Chemotaxis in bacteria. Annu. Rev. Biophys. Bioeng.4, 119–136 (1975). - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
- Nature Publishing Group
- PubMed Central

[1] Fraenkel, G. S. & Gunn, D. L. The Orientation of Animals, Kineses, Taxes and Compass Reactions. (Dover, 1961).

[2] Fraenkel, G. S. & Gunn, D. L. The Orientation of Animals, Kineses, Taxes and Compass Reactions. (Dover, 1961).

[3] Jékely, G. Evolution of phototaxis. Philos. Trans. R. Soc. B364, 2795–2808 (2009). - PMC - PubMed

[4] Jékely, G. Evolution of phototaxis. Philos. Trans. R. Soc. B364, 2795–2808 (2009). - PMC - PubMed

[5] Poff, K. L. & Skokut, M. Thermotaxis by pseudoplasmodia of Dictyostelium discoideum. Proc. Natl Acad. Sci. USA74, 2007–2010 (1977). - PMC - PubMed

[6] Poff, K. L. & Skokut, M. Thermotaxis by pseudoplasmodia of Dictyostelium discoideum. Proc. Natl Acad. Sci. USA74, 2007–2010 (1977). - PMC - PubMed

[7] Adler, J. Chemotaxis in bacteria: motile Escherichia coli migrate in bands that are influenced by oxygen and organic nutrients. Science153, 708–716 (1966). - PubMed

[8] Adler, J. Chemotaxis in bacteria: motile Escherichia coli migrate in bands that are influenced by oxygen and organic nutrients. Science153, 708–716 (1966). - PubMed

[9] Berg, H. C. Chemotaxis in bacteria. Annu. Rev. Biophys. Bioeng.4, 119–136 (1975). - PubMed

[10] Berg, H. C. Chemotaxis in bacteria. Annu. Rev. Biophys. Bioeng.4, 119–136 (1975). - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Chemotactic navigation in robotic swimmers via reset-free hierarchical reinforcement learning

Affiliations

Chemotactic navigation in robotic swimmers via reset-free hierarchical reinforcement learning

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources