Categories
The Sentient Table

Hyperparameters and Rewards

I got the table walking with ARS, but pybullet saving just the perceptron weights didn’t seem to reload progress.

So I switched to PPO, which is a bit more complicated. Stable baselines PPO1 and PPO2 converged too easily, with the table opting to fall over all the time.

So I started editing with the reward function weights, changing it from weighing X axis movement by 1, and weighing Z axis movement by 0.5, to the opposite. So standing up is more important now. I also penalised falling over by a constant value. It’s not looking particularly smart after 11 rounds, but it’s not falling over forward anymore, at least. Yet.

I also changed some PPO hyperparams:

clip_param=0.4, entcoeff=0.2, timesteps_per_actorbatch=1024, 

Basically more exploration than before by allowing more variation in policy changes, and increasing some sort of entropy can’t hurt right? and giving it more time to evaluate per batch, as maybe falling over was as good as you could hope for, in a smaller batch.

This is a good summary of the state of the art in hyperparam tuning. I’ll probably need to do this soon. https://medium.com/criteo-labs/hyper-parameter-optimization-algorithms-2fe447525903

Combine PPO with NES to Improve Exploration https://arxiv.org/pdf/1905.09492.pdf

PBT https://arxiv.org/abs/1711.09846

https://deepmind.com/blog/article/population-based-training-neural-networks

Policy Optimization with Model-based Explorations https://arxiv.org/pdf/1811.07350.pdf

It seems like my fiddling with hyperparams caused ‘kl’

Karhunen-Loeve (KL) to go to NaN. I dunno.

Something about stochastic ‘eigenvector transform’. Similar to Fourier transform for sound, apparently.

So I need to tune hyperparams.

The Stable baselines allow you to change params https://stable-baselines.readthedocs.io/en/master/guide/examples.html#accessing-and-modifying-model-parameters

So kl becoming NaN that could mean i’m returning a zero somewhere from the model.

“In my case, adding 1e-8 to the divisor made the trick… ” – https://github.com/modestyachts/ARS/issues/1

https://stable-baselines.readthedocs.io/en/master/guide/checking_nan.html