Categories
AI/ML Locomotion The Sentient Table

Tensorboard

Tensorboard is TensorFlow’s graphs website at localhost:6006

tensorboard –logdir=.

tensorboard –logdir=/root/ray_results/ for all the experiments

I ran the ARS algorithm with Ray, on the robotable environment, and left it running for a day with the UI off. I set it up to run Tune, but the environments are 400MB of RAM each, so it’s pretty close to the 4GB in this laptop, so I was only running a single experiment.

So the next thing is to get it to start play back from a checkpoint.

(A few days pass, the github issue I had was something basic, that I thought I’d checked.)

So now I have a process where it’s running 100 iterations, then uses the best checkpoint as the starting policy for the next 100 iterations. Now it might just be wishful thinking, but i do actually see a positive trend through the graphs, in ‘wall’ view. There’s also lots of variation of falling over, so I think we might just need to get these hyperparameters tuning. (Probably need to tweak reward weights too. But lol, giving AI access to its own reward function… )

Just a note on that, the AI will definitely just be like, *999999

After training it overnight, with the PBT & ARS, it looks like one policy really beat out the other ones.

It ran a lot longer than the others.