I’ve posted an issue to try get ARS working. https://github.com/ray-project/ray/issues/9573
File "/usr/local/lib/python3.6/dist-packages/ray/rllib/agents/ars/ars_tf_policy.py", line 59, in compute_actions
observation = self.observation_filter(observation[None], update=update)
TypeError: list indices must be integers or slices, not NoneType
No idea yet, but some other bug mentioned something about numpy arrays vs. (good old) arrays. But anyhow, would be great if I can get ARS working on Ray/RLLib, because I just get the sense that PPO is too dumb. It’s never managed to get past falling over, with quite a bit a hyperparam tweaking.
At least ARS has evolved a walking table, so far. When it works in Ray, perhaps we will have the policy save and load, and I can move onto replaying experiences, or continuing training at a checkpoint, etc.
Huh great, well I solved my problems. and it’s running something now.
But rollouts are not ending now. Ok it looks like I need to put a time limit for the environment in the environment, rather than it being a hyperparameter like in pyBullet’s implementation.
Well then, onto the next issue.