Hindsight Experience Replay
(Instead of getting a crap result and saying, wow what a crap result, you say, ah well if that’s what I had wanted to do, then that would have worked.)
It’s a clever idea, Open AI.
One ability humans have, unlike the current generation of model-free RL algorithms, is to learn almost as much from achieving an undesired outcome as from the desired one.
https://arxiv.org/abs/1707.01495
“So why not just pretend that we wanted to achieve this goal to begin with, instead of the one that we set out to achieve originally?”
The HER algorithm achieves this by using what is called “sparse and binary” rewards, which only provide an indication to the agent that either it has failed or succeeded. In contrast, the “dense,” “shaped” rewards used in conventional reinforcement learning tip agents off as to whether they are getting “close,” “closer,” “much closer,” or “very close” to hitting their goal. Such so-called dense rewards can speed up the learning process, but the drawback is that these dense rewards often don’t contain much of a learning signal for the agent to learn from, and can be difficult to design and implement for real-world applications.
https://thenewstack.io/openai-algorithm-allows-ai-to-learn-from-its-mistakes/
G-HER
Hindsight experience replay (HER) based on universal value functions shows promising results in such multi-goal settings by substituting achieved goals for the original goal, frequently giving the agent rewards. However, the achieved goals are limited to the current policy level and lack guidance for learning. We propose a novel guided goal-generation model for multi-goal RL named G-HER. Our method uses a conditional generative recurrent neural network (RNN) to explicitly model the relationship between policy level and goals, enabling the generation of various goals conditions on the different policy levels.
https://sites.google.com/view/gher-algorithm
https://thenewstack.io/openai-algorithm-allows-ai-to-learn-from-its-mistakes/