Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar 25:12:1492526.
doi: 10.3389/frobt.2025.1492526. eCollection 2025.

Reinforcement learning-based dynamic field exploration and reconstruction using multi-robot systems for environmental monitoring

Affiliations

Reinforcement learning-based dynamic field exploration and reconstruction using multi-robot systems for environmental monitoring

Thinh Lu et al. Front Robot AI. .

Abstract

In the realm of real-time environmental monitoring and hazard detection, multi-robot systems present a promising solution for exploring and mapping dynamic fields, particularly in scenarios where human intervention poses safety risks. This research introduces a strategy for path planning and control of a group of mobile sensing robots to efficiently explore and reconstruct a dynamic field consisting of multiple non-overlapping diffusion sources. Our approach integrates a reinforcement learning-based path planning algorithm to guide the multi-robot formation in identifying diffusion sources, with a clustering-based method for destination selection once a new source is detected, to enhance coverage and accelerate exploration in unknown environments. Simulation results and real-world laboratory experiments demonstrate the effectiveness of our approach in exploring and reconstructing dynamic fields. This study advances the field of multi-robot systems in environmental monitoring and has practical implications for rescue missions and field explorations.

Keywords: dynamic field reconstruction; environmental monitoring; mobile sensor networks; multi-robot systems; reinforcement learning; source seeking.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
A symmetric formation composed of four mobile robots rik,i=0,1,2,3 shown in blue. The formation center rck is shown in red. The distance between each robot and the formation center is Δr . The shaded region is the time-varying view-scope Γ(k) .
FIGURE 2
FIGURE 2
Flow chart showing the key components of our algorithm and two operation modes of the robot formation.
FIGURE 3
FIGURE 3
The PPO neural network architecture used in the source mapping mode.
FIGURE 4
FIGURE 4
A 3×3 section of the discretized advection-diffusion field.
FIGURE 5
FIGURE 5
A set of 15 simulation environments, each consisting of one to four diffusion fields centered at various locations with distinct diffusion and advection coefficients.
FIGURE 6
FIGURE 6
Sample of a diffusion field environment over different time steps.
FIGURE 7
FIGURE 7
(A) Stage 1: Pre-training PPO on static fields. (B) Stage 2: Training PPO on dynamic fields. (C) Samples of training environments and generated trajectories post-training.
FIGURE 8
FIGURE 8
Field Re-partitioning Behavior of the Exploration Module using K-Mean Clustering. The red dot is the starting position of the formation center. Partitioning happens at the beginning of an episode and whenever a new source is detected. (A) The trajectory of the formation center. (B) The initial partition of the field. (C) The updated partition after the first source on the upper right corner is detected. (D) The updated partition after the second source on the lower left corner is detected.
FIGURE 9
FIGURE 9
Generated trajectories obtained from simulations across different spatial-diffusion environments. The robot formation center is initially located at rc=(90,10) - shown as the red marker, and the red square shows the position of the formation the end of the episode.
FIGURE 10
FIGURE 10
Analysis of the generated trajectory and mapping errors of our solution in comparison with the Lawn Mowing and Random Walking approaches.
FIGURE 11
FIGURE 11
Mobile robot formation setup for real-world testing and evaluation.
FIGURE 12
FIGURE 12
(a) Lab experiment setup with a projected dynamic field and four mobile robots. (b) Snapshots of the trajectories of the robot formation at three different time steps in an experiment. The red dashed lines represent the trajectories.
FIGURE 13
FIGURE 13
The field exploration and reconstruction results in two experiments with two and three diffusion sources. “Environment Field End State” figures illustrate the end states of the two experiments with corresponding trajectories of the robot formation. The red dots indicate the starting locations of the formation center and the red squares are the ending locations of the formation. “Agent Field End State” figures show the end states of the reconstructed fields in the two experiments. “Concentration” figures illustrate the estimated field concentration along the trajectories of the formation center, and “Mapping Error” figures show the mapping errors while reconstructing the fields.
FIGURE 14
FIGURE 14
Screenshot captured from the experiment on the first environment with two diffusion fields.
FIGURE 15
FIGURE 15
Screenshot captured from the experiment on the second environment with three diffusion fields.

References

    1. Agarap A. F. (2018). Deep learning using rectified linear units (ReLU). CoRR abs/1803 08375. 10.48550/arXiv.1803.08375 - DOI
    1. Bi Q., Zhang X., Wen J., Pan Z., Zhang S., Wang R., et al. (2024). Cure: a hierarchical framework for multi-robot autonomous exploration inspired by centroids of unknown regions. IEEE Trans. Automation Sci. Eng. 21 (3), 3773–3786. 10.1109/tase.2023.3285300 - DOI
    1. Burgard W., Moors M., Stachniss C., Schneider F. (2005). Coordinated multi-robot exploration. Robotics, IEEE Trans. 21, 376–386. 10.1109/tro.2004.839232 - DOI
    1. Cao X., Li M., Tao Y., Lu P. (2024). Hma-sar: multi-agent search and rescue for unknown located dynamic targets in completely unknown environments. IEEE Robotics Automation Lett. 9 (6), 5567–5574. 10.1109/lra.2024.3396097 - DOI
    1. Chen L., Dai S.-L., Dong C. (2024). Adaptive optimal tracking control of an underactuated surface vessel using actor–critic reinforcement learning. IEEE Trans. Neural Netw. Learn. Syst. 35 (6), 7520–7533. 10.1109/tnnls.2022.3214681 - DOI - PubMed

LinkOut - more resources