Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021:50100:10.1109/aero50100.2021.9438267.
doi: 10.1109/aero50100.2021.9438267. Epub 2021 Jun 7.

Exploring Transfers between Earth-Moon Halo Orbits via Multi-Objective Reinforcement Learning

Affiliations

Exploring Transfers between Earth-Moon Halo Orbits via Multi-Objective Reinforcement Learning

Christopher J Sullivan et al. IEEE Aerosp Conf. 2021.

Abstract

Multi-Reward Proximal Policy Optimization, a multi-objective deep reinforcement learning algorithm, is used to examine the design space of low-thrust trajectories for a SmallSat transferring between two libration point orbits in the Earth-Moon system. Using Multi-Reward Proximal Policy Optimization, multiple policies are simultaneously and efficiently trained on three distinct trajectory design scenarios. Each policy is trained to create a unique control scheme based on the trajectory design scenario and assigned reward function. Each reward function is defined using a set of objectives that are scaled via a unique combination of weights to balance guiding the spacecraft to the target mission orbit, incentivizing faster flight times, and penalizing propellant mass usage. Then, the policies are evaluated on the same set of perturbed initial conditions in each scenario to generate the propellant mass usage, flight time, and state discontinuities from a reference trajectory for each control scheme. The resulting low-thrust trajectories are used to examine a subset of the multi-objective trade space for the SmallSat trajectory design scenario. By autonomously constructing the solution space, insights into the required propellant mass, flight time, and transfer geometry are rapidly achieved.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Members of the Earth-Moon L1 Lyapunov and northern halo orbit families depicted using dashed and solid arcs, respectively, and shaded by Jacobi constant.
Figure 2:
Figure 2:
Members of the Earth-Moon L2 Lyapunov and southern halo orbit families depicted using dashed and solid arcs, respectively, and shaded by Jacobi constant.
Figure 3:
Figure 3:
A natural, heteroclinic transfer from an L1 Lyapunov orbit to an L2 Lyapunov orbit.
Figure 4:
Figure 4:
MRPPO training N policies with ki agents and reward functions ri,t(s¯t,u¯t) within a shared dynamical model.
Figure 5:
Figure 5:
1,000 evaluation trajectories departing an L1 Lyapunov orbit to an L2 Lyapunov orbit with an equal Jacobi constant in Scenario 1 controlled using: (a) Policy 1 and (b) Policy 8.
Figure 6:
Figure 6:
Relationship between the propellant mass usage and the position and velocity differences for the L1 Lyapunov to L2 Lyapunov transfer in Scenario 1.
Figure 7:
Figure 7:
Trajectory from a highly perturbed initial condition departing an L1 Lyapunov orbit to an L2 Lyapunov orbit with an equal Jacobi constant in Scenario 1 controlled using: (a) Policy 1 and (b) Policy 8.
Figure 8:
Figure 8:
Thrust magnitude along an evaluation trajectory in Scenario 1 commanded using: (a) Policy 1 and (b) Policy 8.
Figure 9:
Figure 9:
Position and velocity differences measured with respect to the closest state on the final orbit for a single trajectory in Scenario 1 commanded by Policy 1.
Figure 10:
Figure 10:
Position and velocity differences measured with respect to the closest state on the final orbit for a single trajectory in Scenario 1 commanded by Policy 8.
Figure 11:
Figure 11:
1,000 evaluation trajectories departing an L1 northern halo orbit to an L2 southern halo orbit with an equal Jacobi constant in the Earth-Moon CR3BP in Scenario 2 controlled using: (a) Policy 1 and (b) Policy 8.
Figure 12:
Figure 12:
Relationship between the state discontinuities and propellant mass usage for the L1 northern halo to L2 southern transfer in Scenario 2.
Figure 13:
Figure 13:
Trajectory from a highly perturbed initial condition departing an L1 northern halo orbit to an L2 southern halo orbit with an equal Jacobi constant in Scenario 2 controlled using: (a) Policy 1 and (b) Policy 8.
Figure 14:
Figure 14:
Thrust magnitude along an evaluation trajectory in Scenario 2 commanded using: (a) Policy 1 and (b) Policy 8.
Figure 15:
Figure 15:
Jacobi constant along an evaluation trajectory in Scenario 2 commanded using: (a) Policy 1 and (b) Policy 8.
Figure 16:
Figure 16:
Position and velocity differences measured with respect to the closest state on the final orbit for a single trajectory in Scenario 2 commanded by Policy 1.
Figure 17:
Figure 17:
Position and velocity differences measured with respect to the closest state on the final orbit for a single trajectory in Scenario 2 commanded by Policy 8.
Figure 18:
Figure 18:
1,000 evaluation trajectories departing an L1 northern halo orbit to an L2 southern halo orbit with a distinct Jacobi constant in the Earth-Moon CR3BP in Scenario 3 controlled using: (a) Policy 1 and (b) Policy 8.
Figure 19:
Figure 19:
Relationship between the state discontinuities and propellant mass usage for the L1 northern halo to L2 southern halo transfer with distinct Jacobi constants in Scenario 3.
Figure 20:
Figure 20:
Trajectory from a highly perturbed initial condition departing an L1 northern halo orbit to an L2 southern halo orbit with a distinct Jacobi constant in Scenario 3 controlled using: (a) Policy 1 and (b) Policy 8.
Figure 21:
Figure 21:
Jacobi constant along an evaluation trajectory in Scenario 3 commanded using: (a) Policy 1 and (b) Policy 8.
Figure 22:
Figure 22:
Thrust magnitude along an evaluation trajectory in Scenario 3 commanded using: (a) Policy 1 and (b) Policy 8.
Figure 23:
Figure 23:
Position and velocity differences measured with respect to the closest state on the final orbit for a single trajectory in Scenario 3 commanded by Policy 1.
Figure 24:
Figure 24:
Position and velocity differences measured with respect to the closest state on the final orbit for a single trajectory in Scenario 3 commanded by Policy 8.

References

    1. Schoolcraft J, Klesh A, and Werne T, “Marco: interplanetary mission development on a cubesat scale,” in Space Operations: Contributions from the Global Community. Springer, 2017, pp. 221–231.
    1. NASA, MarCO (Mars Cube One), 2020, https://solarsystem.nasa.gov/missions/mars-cube-one/in-depth/; Accessed 20 September 2020.
    1. Bosanac N, Cox A, Howell K, and Folta DC, “Trajectory design for a cislunar cubesat leveraging dynamical systems techniques: The lunar icecube mission,” Acta Astronautica, pp. 283–296, 2018.
    1. Genova AL and Dunham DW, “Trajectory design for the lunar polar hydrogen mapper mission,” 27th AAS/AIAA Space Flight Mechanics Meeting, San Antonio, TX, 2017.
    1. Johnson L, Castillo-Rogez J, Dervan J, and McNutt L, “Near earth asteroid (nea) scout,” 4th International Symposium on Solar Sailing, Kyoto Japan, 2017.

LinkOut - more resources