. 2021:50100:10.1109/aero50100.2021.9438267.

doi: 10.1109/aero50100.2021.9438267. Epub 2021 Jun 7.

Exploring Transfers between Earth-Moon Halo Orbits via Multi-Objective Reinforcement Learning

Christopher J Sullivan¹, Natasha Bosanac¹, Rodney L Anderson², Alinda K Mashiku³, Jeffrey R Stuart⁴

Affiliations

¹ Colorado Center for Astrodynamics, Smead Aerospace Engineering Sciences, University of Colorado Boulder, 429 UCB, Boulder, CO 80303.
² Jet Propulsion Laboratory, California Institute of Technology, 4800 Oak Drive Dr. Pasadena, CA 91109.
³ Navigation and Mission Design Branch NASA Goddard Space Flight Center 8800 Greenbelt Rd. Greenbelt, MD 20771.
⁴ Jet Propulsion Laboratory California Institute of Technology 4800 Oak Drive Dr. Pasadena, CA 91109.

PMID: 35028651
PMCID: PMC8753611
DOI: 10.1109/aero50100.2021.9438267

Exploring Transfers between Earth-Moon Halo Orbits via Multi-Objective Reinforcement Learning

Christopher J Sullivan et al. IEEE Aerosp Conf. 2021.

. 2021:50100:10.1109/aero50100.2021.9438267.

doi: 10.1109/aero50100.2021.9438267. Epub 2021 Jun 7.

Authors

Christopher J Sullivan¹, Natasha Bosanac¹, Rodney L Anderson², Alinda K Mashiku³, Jeffrey R Stuart⁴

Affiliations

¹ Colorado Center for Astrodynamics, Smead Aerospace Engineering Sciences, University of Colorado Boulder, 429 UCB, Boulder, CO 80303.
² Jet Propulsion Laboratory, California Institute of Technology, 4800 Oak Drive Dr. Pasadena, CA 91109.
³ Navigation and Mission Design Branch NASA Goddard Space Flight Center 8800 Greenbelt Rd. Greenbelt, MD 20771.
⁴ Jet Propulsion Laboratory California Institute of Technology 4800 Oak Drive Dr. Pasadena, CA 91109.

PMID: 35028651
PMCID: PMC8753611
DOI: 10.1109/aero50100.2021.9438267

Abstract

Multi-Reward Proximal Policy Optimization, a multi-objective deep reinforcement learning algorithm, is used to examine the design space of low-thrust trajectories for a SmallSat transferring between two libration point orbits in the Earth-Moon system. Using Multi-Reward Proximal Policy Optimization, multiple policies are simultaneously and efficiently trained on three distinct trajectory design scenarios. Each policy is trained to create a unique control scheme based on the trajectory design scenario and assigned reward function. Each reward function is defined using a set of objectives that are scaled via a unique combination of weights to balance guiding the spacecraft to the target mission orbit, incentivizing faster flight times, and penalizing propellant mass usage. Then, the policies are evaluated on the same set of perturbed initial conditions in each scenario to generate the propellant mass usage, flight time, and state discontinuities from a reference trajectory for each control scheme. The resulting low-thrust trajectories are used to examine a subset of the multi-objective trade space for the SmallSat trajectory design scenario. By autonomously constructing the solution space, insights into the required propellant mass, flight time, and transfer geometry are rapidly achieved.

PubMed Disclaimer

Figures

**Figure 1:**
Members of the Earth-Moon L₁ Lyapunov and northern halo orbit families depicted using dashed and solid arcs, respectively, and shaded by Jacobi constant.

**Figure 2:**
Members of the Earth-Moon L₂ Lyapunov and southern halo orbit families depicted using dashed and solid arcs, respectively, and shaded by Jacobi constant.

**Figure 3:**
A natural, heteroclinic transfer from an L₁ Lyapunov orbit to an L₂ Lyapunov orbit.

**Figure 4:**
MRPPO training N policies with k_i agents and reward functions $r_{i, t} ({\bar{s}}_{t}, {\bar{u}}_{t})$ within a shared dynamical model.

**Figure 5:**
1,000 evaluation trajectories departing an L₁ Lyapunov orbit to an L₂ Lyapunov orbit with an equal Jacobi constant in Scenario 1 controlled using: (a) Policy 1 and (b) Policy 8.

**Figure 6:**
Relationship between the propellant mass usage and the position and velocity differences for the L₁ Lyapunov to L₂ Lyapunov transfer in Scenario 1.

**Figure 7:**
Trajectory from a highly perturbed initial condition departing an L₁ Lyapunov orbit to an L₂ Lyapunov orbit with an equal Jacobi constant in Scenario 1 controlled using: (a) Policy 1 and (b) Policy 8.

**Figure 8:**
Thrust magnitude along an evaluation trajectory in Scenario 1 commanded using: (a) Policy 1 and (b) Policy 8.

**Figure 9:**
Position and velocity differences measured with respect to the closest state on the final orbit for a single trajectory in Scenario 1 commanded by Policy 1.

**Figure 10:**
Position and velocity differences measured with respect to the closest state on the final orbit for a single trajectory in Scenario 1 commanded by Policy 8.

**Figure 11:**
1,000 evaluation trajectories departing an L₁ northern halo orbit to an L₂ southern halo orbit with an equal Jacobi constant in the Earth-Moon CR3BP in Scenario 2 controlled using: (a) Policy 1 and (b) Policy 8.

**Figure 12:**
Relationship between the state discontinuities and propellant mass usage for the L₁ northern halo to L₂ southern transfer in Scenario 2.

**Figure 13:**
Trajectory from a highly perturbed initial condition departing an L₁ northern halo orbit to an L₂ southern halo orbit with an equal Jacobi constant in Scenario 2 controlled using: (a) Policy 1 and (b) Policy 8.

**Figure 14:**
Thrust magnitude along an evaluation trajectory in Scenario 2 commanded using: (a) Policy 1 and (b) Policy 8.

**Figure 15:**
Jacobi constant along an evaluation trajectory in Scenario 2 commanded using: (a) Policy 1 and (b) Policy 8.

**Figure 16:**
Position and velocity differences measured with respect to the closest state on the final orbit for a single trajectory in Scenario 2 commanded by Policy 1.

**Figure 17:**
Position and velocity differences measured with respect to the closest state on the final orbit for a single trajectory in Scenario 2 commanded by Policy 8.

**Figure 18:**
1,000 evaluation trajectories departing an L₁ northern halo orbit to an L₂ southern halo orbit with a distinct Jacobi constant in the Earth-Moon CR3BP in Scenario 3 controlled using: (a) Policy 1 and (b) Policy 8.

**Figure 19:**
Relationship between the state discontinuities and propellant mass usage for the L₁ northern halo to L₂ southern halo transfer with distinct Jacobi constants in Scenario 3.

**Figure 20:**
Trajectory from a highly perturbed initial condition departing an L₁ northern halo orbit to an L₂ southern halo orbit with a distinct Jacobi constant in Scenario 3 controlled using: (a) Policy 1 and (b) Policy 8.

**Figure 21:**
Jacobi constant along an evaluation trajectory in Scenario 3 commanded using: (a) Policy 1 and (b) Policy 8.

**Figure 22:**
Thrust magnitude along an evaluation trajectory in Scenario 3 commanded using: (a) Policy 1 and (b) Policy 8.

**Figure 23:**
Position and velocity differences measured with respect to the closest state on the final orbit for a single trajectory in Scenario 3 commanded by Policy 1.

**Figure 24:**
Position and velocity differences measured with respect to the closest state on the final orbit for a single trajectory in Scenario 3 commanded by Policy 8.

See this image and copyright information in PMC

References

1. Schoolcraft J, Klesh A, and Werne T, “Marco: interplanetary mission development on a cubesat scale,” in Space Operations: Contributions from the Global Community. Springer, 2017, pp. 221–231.
1. NASA, MarCO (Mars Cube One), 2020, https://solarsystem.nasa.gov/missions/mars-cube-one/in-depth/; Accessed 20 September 2020.
1. Bosanac N, Cox A, Howell K, and Folta DC, “Trajectory design for a cislunar cubesat leveraging dynamical systems techniques: The lunar icecube mission,” Acta Astronautica, pp. 283–296, 2018.
1. Genova AL and Dunham DW, “Trajectory design for the lunar polar hydrogen mapper mission,” 27th AAS/AIAA Space Flight Mechanics Meeting, San Antonio, TX, 2017.
1. Johnson L, Castillo-Rogez J, Dervan J, and McNutt L, “Near earth asteroid (nea) scout,” 4th International Symposium on Solar Sailing, Kyoto Japan, 2017.

Grants and funding

80NSSC19K1147/NSSC/Shared Services Center NASA/United States

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Exploring Transfers between Earth-Moon Halo Orbits via Multi-Objective Reinforcement Learning

Affiliations

Exploring Transfers between Earth-Moon Halo Orbits via Multi-Objective Reinforcement Learning

Authors

Affiliations

Abstract

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources