Kai Jiang, Xiaolong Qin, Reinforcement learning with goal-distance gradient, Vol. 2024 (2024), Article ID 1, pp. 1-13

Full Text: PDF
DOI: 10.23952/cot.2024.1

Received December 7, 2022; Accepted September 1, 2023; Publisehd October 1, 2023

 

Abstract. Reinforcement learning typically utilizes environmental feedback rewards to train agents. However, rewards in real-world environments are often sparse, and in some cases, certain environments offer no rewards. The majority of current methods struggle to achieve satisfactory performance in environments with sparse or nonexistent rewards. While shaped rewards effectively address sparse reward tasks, their applicability is restricted to specific problems, and learning is also prone to local optima. We introduce a model-free method that addresses the issue of sparse rewards in a general environment without relying on environmental rewards. Our approach employs the minimum state transitions as a distance metric to substitute the environmental reward. Additionally, it introduces a goal-distance gradient to facilitate policy improvement. Building upon the characteristics of our method, we also present a bridge point planning strategy aimed at enhancing exploration efficiency and tackling more complex tasks. Experimental results demonstrate that our method outperforms previous approaches in addressing challenges such as sparse rewards and local optima in complex environments.

 

How to Cite this Article:
K. Jiang, X. Qin, Reinforcement learning with goal-distance gradient, Commun. Optim. Theory 2024 (2024) 1.