https://github.com/anyscale/academy
They have a relevant tutorial on RLLib (Ray)
https://github.com/anyscale/academy
They have a relevant tutorial on RLLib (Ray)
After seeing the ‘Replay Buffer’ in The TF-Agents SAC minitaur https://www.tensorflow.org/agents/tutorials/7_SAC_minitaur_tutorial tutorial, I’m starting to think replay is going to be a thing for the robot, one way or another.
I’m sticking to the google protobuf code that the minitaur uses, and will just need to save the best episodes, and work out how to replay them. The comments ask “use recordio?”
https://stackoverflow.com/questions/53219720/tfrecord-vs-recordio
import os import inspect currentdir = os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe()))) parentdir = os.path.dirname(os.path.dirname(currentdir)) os.sys.path.insert(0, parentdir) import argparse from gym_robotable.envs import logging if __name__ == "__main__": parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter) parser.add_argument('--log_file', help='path to protobuf file', default='/opt/gym-robotable/logs/robotable_log_2020-07-12-204613') args = parser.parse_args() logging = logging.RobotableLogging() episode = logging.restore_episode(args.log_file) print(dir (episode)) print("episode=",episode) fields = episode.ListFields()
So that’s printing out the episode.
Next step is saving only the best episodes
Then next step is stepping the simulation with the actions stored.
But now I’m not as sure. It might be better to switch to RLLAB (&Ray)
Would rather hide the details of serialization if I can.
Distributed Evolutionary Algorithms in Python
An oldie but a goodie. Was thinking to just implement the locomotion using good old genetic programming.
You could probably generate a walking robot using a genetic algorithm that repeated the tree’s actions a number of times. Bet it would be faster than the whole neural network policy training replay buffer spiel.
https://deap.readthedocs.io/en/master/tutorials/advanced/gp.html
https://docs.ray.io/en/master/rllib-algorithms.html Seems like this might be the most up to date baselines repo.
Ray is a fast and simple framework for building and running distributed applications.
Ray is packaged with the following libraries for accelerating machine learning workloads:
ARS implementation: https://github.com/ray-project/ray/blob/master/rllib/agents/ars/ars.py
Automatic domain randomization
arxiv: https://arxiv.org/pdf/1910.07113.pdf
Increases randomisation of parameters as training goes on: https://openai.com/blog/solving-rubiks-cube/
Based on top of the work by these French sim2real ppl: https://hal.inria.fr/tel-01974203/file/89722_GOLEMO_2018_archivage.pdf
Somehow just been following OpenAI and missed all the action at the other big algorithm R&D company. https://deepmind.com/research
Experience Replay: https://deepmind.com/research/open-source/Reverb
https://deepmind.com/research/open-source/Acme_os Their new framework; https://github.com/deepmind/acme
https://github.com/deepmind/dm_control – seems they’re a Mujoco house.
I got the table walking with ARS, but pybullet saving just the perceptron weights didn’t seem to reload progress.
So I switched to PPO, which is a bit more complicated. Stable baselines PPO1 and PPO2 converged too easily, with the table opting to fall over all the time.
So I started editing with the reward function weights, changing it from weighing X axis movement by 1, and weighing Z axis movement by 0.5, to the opposite. So standing up is more important now. I also penalised falling over by a constant value. It’s not looking particularly smart after 11 rounds, but it’s not falling over forward anymore, at least. Yet.
I also changed some PPO hyperparams:
clip_param=0.4, entcoeff=0.2, timesteps_per_actorbatch=1024,
Basically more exploration than before by allowing more variation in policy changes, and increasing some sort of entropy can’t hurt right? and giving it more time to evaluate per batch, as maybe falling over was as good as you could hope for, in a smaller batch.
This is a good summary of the state of the art in hyperparam tuning. I’ll probably need to do this soon. https://medium.com/criteo-labs/hyper-parameter-optimization-algorithms-2fe447525903
Combine PPO with NES to Improve Exploration https://arxiv.org/pdf/1905.09492.pdf
PBT https://arxiv.org/abs/1711.09846
https://deepmind.com/blog/article/population-based-training-neural-networks
Policy Optimization with Model-based Explorations https://arxiv.org/pdf/1811.07350.pdf
It seems like my fiddling with hyperparams caused ‘kl’
Karhunen-Loeve (KL) to go to NaN. I dunno.
Something about stochastic ‘eigenvector transform’. Similar to Fourier transform for sound, apparently.
So I need to tune hyperparams.
The Stable baselines allow you to change params https://stable-baselines.readthedocs.io/en/master/guide/examples.html#accessing-and-modifying-model-parameters
So kl becoming NaN that could mean i’m returning a zero somewhere from the model.
“In my case, adding 1e-8 to the divisor made the trick… ” – https://github.com/modestyachts/ARS/issues/1
https://stable-baselines.readthedocs.io/en/master/guide/checking_nan.html
stable baselines – https://stable-baselines.readthedocs.io/en/master/
Ray / RLLib – https://docs.ray.io/en/master/
OpenAI Spinning Up – https://spinningup.openai.com/en/latest/spinningup/keypapers.html
RLKit – https://github.com/vitchyr/rlkit
https://sites.google.com/view/diayn/
Unsupervised novelty
https://github.com/ben-eysenbach/sac/blob/master/DIAYN.md
Soft Actor Critic – https://towardsdatascience.com/soft-actor-critic-demystified-b8427df61665
arxiv: https://arxiv.org/abs/1801.01290 (Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor)
arxiv: https://arxiv.org/abs/1812.05905 (Soft Actor-Critic Algorithms and Applications)
Clear journal and set up 10 day rollover: journalctl --vacuum-time=10d Clean the apt cache du -sh /var/cache/apt/archives apt-get clean There's a program to visualise space: apt-get install baobab
Turns out the other big things are
pytorch is 1.3gb
python 2.7 and ansible is 500mb
in var/lib, docker is 2.5gb and flatpak is 1.5gb
tensorflow is 450mb
I got rid of my old buckets code, got another 1.5GB back, by deleting docker completely, and reinstalling it.
https://askubuntu.com/questions/935569/how-to-completely-uninstall-docker
and how to install reinstall docker on ubuntu:
apt-get install docker-ce docker-ce-cli containerd.io
https://docs.docker.com/engine/install/ubuntu/
Got rid of flatpak :
flatpak remote-add flathub https://flathub.org/repo/flathub.flatpakrepo
flatpak uninstall –all
This uninstalled inkscape and something gtk related.
I also got rid of anything python 2 related.
sudo apt-get purge python 2.7