Atmospheric Re-Entry Guidance via Reinforcement Learning

Atmospheric Re-Entry Guidance

Autonomy is a critical component for the next generation of hypersonic vehicles. Indeed, the effective implementation of the closed-loop SENSE-THINK-ACT requires a new generation of intelligent algorithms that can 1) adapt to elusive targets and 2) be robust against unknown environments. However, the current generation of autonomous hypersonic systems requires a set of rule-based systems causing a dramatic limitation in the overall performance. One of the major challenges is how to devise an NGC system that effectively guides a hypersonic vehicle to mission success with guaranteed performances in a highly uncertain and changing environment. Over the past few years, enabled by large data availability and advancements in computing hardware (e.g. GPUs), there has been an explosion of intelligent systems based on deep learning that enables adaptive and fast reasoning over data. The University of Arizona Space Systems Engineering Laboratory (UA-SSEL) has partnered with Sandia National Lab (SNL) to address the problem of autonomous, adaptive and robust real-time hypersonic guidance in uncertain and changing environments by developing innovative algorithms based on Deep Reinforcement Learning (DRL). DRL enables learning a closed-loop guidance policy by experience. Such policy (e.g. bank angle, angle of attack as function of the current position and velocity) is parameterized via a deep network and its parameters (weights) learned by experience, i.e. letting the agent interact with the environment in an attempt to maximize a reward signal. For hypersonic vehicles guidance, within the DRL framework, we developed a two-year research program to answer the following fundamental Research Questions (RQ):

  • RQ1: Can deep neural networks be trained to learn bank angle and angle of attack to autonomously and accurately execute hypersonic-based missions in uncertain environments.
  • RQ2: Can we train deep networks for hypersonic guidance to enable “learn to learn” (also known as meta-learning) i.e. learn to adapt to changing environment and targeting conditions in a few iterations during the ground-based training and re-training process.
  • RQ3: Can we train deep networks for hypersonic guidance with certified convergence, adaptability and robustness?