(Population Based Training, and Augmented Random Search)
Got a reward of 902 with this robotable. That’s a success. It’s an amusing walk. Still has a way to go, probably.
Miranda doesn’t want to train it with that one dodge ball algorithm you sometimes see, for toughening up AIs. I’ll see about adding in the uneven terrain though, and maybe trying to run that obstacle course library.
But there are other, big things to do, which take some doing.
The egg-scooper, or candler, or handler, or picker-upper will likely use an approach similar to the OpenAI Rubik’s cube solver, with a camera in simulation as the input to a Convolutional Neural Network of some sort, so that there is a transferred mapping, between simulated camera, and real camera.
Also, getting started on Sim-to-Real attempts, of transferring locomotion policies to the RPi robot, seeing if it will walk.
The PBT algorithm changes up the hyperparameters occasionally.
It might be smart to use ensemble or continuous learning by switching to a PPO implementation at the 902 reward checkpoint.
I get the sense that gradient descent becomes more useful once you’ve got past the novelty pitfalls, like learning to step forward instead of falling over. It can probably speed up learning at this point.