Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 1;16(1):5441.
doi: 10.1038/s41467-025-60646-z.

Chemotactic navigation in robotic swimmers via reset-free hierarchical reinforcement learning

Affiliations

Chemotactic navigation in robotic swimmers via reset-free hierarchical reinforcement learning

Tongzhao Xiong et al. Nat Commun. .

Abstract

Microorganisms have evolved diverse strategies to propel themselves in viscous fluids, navigate complex environments, and exhibit taxis in response to stimuli. This has inspired the development of miniature robots, where artificial intelligence (AI) is playing an increasingly important role. Can AI endow these synthetic systems with intelligence akin to that honed through natural evolution? Here, we demonstrate, in silico, chemotactic navigation in a multi-link robotic model using two-level hierarchical reinforcement learning (RL). The lower-level RL allows the model-configured as a chain or ring topology-to acquire topology-adapted swimming gaits: wave propagation characteristic of flagella or body oscillation akin to an amoebae. Such chain and ring swimmers, further enabled by the higher-level RL, accomplish chemotactic navigation in prototypical biologically relevant scenarios that feature conflicting chemoattractants, pursuing a swimming bacterial mimic, steering in vortical flows, and squeezing through tight constrictions. Additionally, we achieve reset-free RL under partial observability, where simulated robots rely solely on local scalar observations rather than global or vectorial data. This advancement illuminates potential solutions for overcoming persistent challenges of manual resets and partial observability in real-world microrobotic RL.

PubMed Disclaimer

Conflict of interest statement

Inclusion an ethics statement: All collaborators of this work have fulfilled the criteria for authorship required by Nature Portfolio journals and have been included as authors, as their participation was important for the design and implementation of the study. Roles and responsibilities were agreed upon among collaborators ahead of the research. This work includes findings that are locally and globally relevant. The research was not restricted or prohibited in the setting of the researchers and does not result in stigmatization, incrimination, discrimination or personal risk to participants. Local and regional research relevant to our study was taken into account in citations. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Two-level hierarchical RL framework for in silico chemotactic navigation of multi-link robotic models in viscous fluids.
The model adopts either a chain (a) or ring (b) topology. c Workflow of the lower-level RL: training simulated robots to acquire primitive swimming skills—propulsion and reorientation. d Workflow of the higher-level RL: training robots to steer up chemoattractant gradients.
Fig. 2
Fig. 2. Learning primitive swimming skills: propulsion and reorientation.
a A chain swimmer achieves propulsion through symmetric beating, with vectors indicating the flow field. The inset shows the horizontal (x) and vertical (y) displacements of the swimmer’s tip, resembling a traveling transverse wave. b The chain swimmer reorients through asymmetric actuation. c A ring swimmer propels by periodically contracting and expanding its body. The inset displays the x and y displacements of the swimmer’s front, typical of a longitudinal wave propagating horizontally. d Evolution of front-rear pressure differences during the policy iteration process: training the chain swimmer for propulsion (purple solid line) and reorientation (purple dashed), and the ring model for propulsion (green solid in the inset). This quantity is time-averaged over one policy. e, f Similar to d, but for the rotational rate and centroid translational velocity of the swimmers, respectively. The latter reaches a plateau value (dashed line), U/(Ω^L), representing the swimming speed of simulated agents once training has converged. Source data are provided as a Source Data file.
Fig. 3
Fig. 3. Chemotaxis towards a stationary chemical source with and without chemical disturbances.
a A chain swimmer initially at (0, 0) swims toward the target source at (4, 0). b Similar to a, but with a disturbing chemical source (star symbol) at (0, 4) featuring strengths of Qd = 0.2Q (left panel) and 0.6Q (right panel), respectively. Here, ΔC signifies the difference in the chemoattractant concentration produced by the target and disturbance sources. c Akin to a, but for a ring model. d Analogous to b, but for a ring swimmer, with the disturbing source strengths of Qd = 0.6Q (left panel) and 0.8Q (right panel), respectively. Source data is provided as a Source Data file.
Fig. 4
Fig. 4. Simulated robotic swimmers pursue a dynamic chemical source of strength Q.
ac A chain swimmer chasing (a), encountering (b), and following (c) a chemical source (symbolized by a star) moving along a circular orbit. df Similar to ac, but for a ring swimmer and a source executing a figure-eight-shaped path. g Trajectories of chain (left column) and ring (right column) swimmers pursuing a source at varying speeds. Source data are provided as a Source Data file.
Fig. 5
Fig. 5. Chemotactic navigation in periodic cellular vortices with a vortical velocity Uv.
Swimmers start from (0, 0) with the goal of reaching a chemical source positioned at (4, 4). a, b A chain model swims in vortices with Uv=0.005Ω^L and Uv=0.03Ω^L, respectively. The background color denotes the dimensionless flow vorticity. The inset of A depicts the typical streamlines. c Statistics of successful chemotaxis versus the vortical strength Uv, where crosses and circles signify failures and successes, respectively. d, e similar to a and b, but for a ring swimmer, with Uv=0.0005Ω^L and Uv=0.005Ω^L, respectively. f Similar to c, while for a ring swimmer. Source data is provided as a Source Data file.
Fig. 6
Fig. 6. Chemotactic navigation through a narrow constriction.
a The swimmer navigates a constriction of height d to reach the chemical source located on the opposite side. The source and locomotion are within the same xy-plane. A chain swimmer successfully traverses a constriction of d = 0.3L (b), but fails to pass through a narrower one of d = 0.2L (c). d Similar to b, but for a ring swimmer, which squeezes through the narrow gap of d = 0.5L/π, where L/π denotes the effective diameter of ring models. e analogous to d, but with a tighter constriction of d = 0.4L/π, which obstructs the swimmer. Source data is provided as a Source Data file.

Similar articles

References

    1. Fraenkel, G. S. & Gunn, D. L. The Orientation of Animals, Kineses, Taxes and Compass Reactions. (Dover, 1961).
    1. Jékely, G. Evolution of phototaxis. Philos. Trans. R. Soc. B364, 2795–2808 (2009). - PMC - PubMed
    1. Poff, K. L. & Skokut, M. Thermotaxis by pseudoplasmodia of Dictyostelium discoideum. Proc. Natl Acad. Sci. USA74, 2007–2010 (1977). - PMC - PubMed
    1. Adler, J. Chemotaxis in bacteria: motile Escherichia coli migrate in bands that are influenced by oxygen and organic nutrients. Science153, 708–716 (1966). - PubMed
    1. Berg, H. C. Chemotaxis in bacteria. Annu. Rev. Biophys. Bioeng.4, 119–136 (1975). - PubMed

LinkOut - more resources