Categories
dev institutes

Anyscale Academy

https://github.com/anyscale/academy

They have a relevant tutorial on RLLib (Ray)

Categories
AI/ML dev Locomotion simulation

Replay

After seeing the ‘Replay Buffer’ in The TF-Agents SAC minitaur https://www.tensorflow.org/agents/tutorials/7_SAC_minitaur_tutorial tutorial, I’m starting to think replay is going to be a thing for the robot, one way or another.

I’m sticking to the google protobuf code that the minitaur uses, and will just need to save the best episodes, and work out how to replay them. The comments ask “use recordio?”

https://stackoverflow.com/questions/53219720/tfrecord-vs-recordio

import os
import inspect

currentdir = os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe())))
parentdir = os.path.dirname(os.path.dirname(currentdir))
os.sys.path.insert(0, parentdir)

import argparse
from gym_robotable.envs import logging

if __name__ == "__main__":

    parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
    parser.add_argument('--log_file', help='path to protobuf file', default='/opt/gym-robotable/logs/robotable_log_2020-07-12-204613')
    args = parser.parse_args()
    logging = logging.RobotableLogging()
    episode = logging.restore_episode(args.log_file)
    print(dir (episode))
    print("episode=",episode)
    fields = episode.ListFields()

So that’s printing out the episode.

Next step is saving only the best episodes

Then next step is stepping the simulation with the actions stored.

But now I’m not as sure. It might be better to switch to RLLAB (&Ray)

Would rather hide the details of serialization if I can.

Categories
dev evolution Locomotion

DEAP

Distributed Evolutionary Algorithms in Python

deap.readthedocs.org/

An oldie but a goodie. Was thinking to just implement the locomotion using good old genetic programming.

You could probably generate a walking robot using a genetic algorithm that repeated the tree’s actions a number of times. Bet it would be faster than the whole neural network policy training replay buffer spiel.

https://deap.readthedocs.io/en/master/tutorials/advanced/gp.html

Categories
AI/ML deep dev Locomotion

Ray

https://docs.ray.io/en/master/rllib-algorithms.html Seems like this might be the most up to date baselines repo.

Ray is a fast and simple framework for building and running distributed applications.

Ray is packaged with the following libraries for accelerating machine learning workloads:

  • Tune: Scalable Hyperparameter Tuning
  • RLlib: Scalable Reinforcement Learning
  • RaySGD: Distributed Training Wrappers

ARS implementation: https://github.com/ray-project/ray/blob/master/rllib/agents/ars/ars.py

Categories
AI/ML sim2real simulation

ADR

Automatic domain randomization

arxiv: https://arxiv.org/pdf/1910.07113.pdf

Increases randomisation of parameters as training goes on: https://openai.com/blog/solving-rubiks-cube/

Based on top of the work by these French sim2real ppl: https://hal.inria.fr/tel-01974203/file/89722_GOLEMO_2018_archivage.pdf

Categories
AI/ML institutes

DeepMind

Somehow just been following OpenAI and missed all the action at the other big algorithm R&D company. https://deepmind.com/research

Experience Replay: https://deepmind.com/research/open-source/Reverb

https://deepmind.com/research/open-source/Acme_os Their new framework; https://github.com/deepmind/acme

https://github.com/deepmind/dm_control – seems they’re a Mujoco house.

Categories
The Sentient Table

Hyperparameters and Rewards

I got the table walking with ARS, but pybullet saving just the perceptron weights didn’t seem to reload progress.

So I switched to PPO, which is a bit more complicated. Stable baselines PPO1 and PPO2 converged too easily, with the table opting to fall over all the time.

So I started editing with the reward function weights, changing it from weighing X axis movement by 1, and weighing Z axis movement by 0.5, to the opposite. So standing up is more important now. I also penalised falling over by a constant value. It’s not looking particularly smart after 11 rounds, but it’s not falling over forward anymore, at least. Yet.

I also changed some PPO hyperparams:

clip_param=0.4, entcoeff=0.2, timesteps_per_actorbatch=1024, 

Basically more exploration than before by allowing more variation in policy changes, and increasing some sort of entropy can’t hurt right? and giving it more time to evaluate per batch, as maybe falling over was as good as you could hope for, in a smaller batch.

This is a good summary of the state of the art in hyperparam tuning. I’ll probably need to do this soon. https://medium.com/criteo-labs/hyper-parameter-optimization-algorithms-2fe447525903

Combine PPO with NES to Improve Exploration https://arxiv.org/pdf/1905.09492.pdf

PBT https://arxiv.org/abs/1711.09846

https://deepmind.com/blog/article/population-based-training-neural-networks

Policy Optimization with Model-based Explorations https://arxiv.org/pdf/1811.07350.pdf

It seems like my fiddling with hyperparams caused ‘kl’

Karhunen-Loeve (KL) to go to NaN. I dunno.

Something about stochastic ‘eigenvector transform’. Similar to Fourier transform for sound, apparently.

So I need to tune hyperparams.

The Stable baselines allow you to change params https://stable-baselines.readthedocs.io/en/master/guide/examples.html#accessing-and-modifying-model-parameters

So kl becoming NaN that could mean i’m returning a zero somewhere from the model.

“In my case, adding 1e-8 to the divisor made the trick… ” – https://github.com/modestyachts/ARS/issues/1

https://stable-baselines.readthedocs.io/en/master/guide/checking_nan.html

Categories
Locomotion The Sentient Table

RL Toolboxes

stable baselines – https://stable-baselines.readthedocs.io/en/master/

Ray / RLLib – https://docs.ray.io/en/master/

OpenAI Spinning Up – https://spinningup.openai.com/en/latest/spinningup/keypapers.html

RLKit – https://github.com/vitchyr/rlkit

Garage – https://github.com/rlworkgroup/garage

Categories
AI/ML simulation

Diversity is all you need (DIAYN)

https://sites.google.com/view/diayn/
Unsupervised novelty
https://github.com/ben-eysenbach/sac/blob/master/DIAYN.md

Soft Actor Critic – https://towardsdatascience.com/soft-actor-critic-demystified-b8427df61665

arxiv: https://arxiv.org/abs/1801.01290 (Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor)

arxiv: https://arxiv.org/abs/1812.05905 (Soft Actor-Critic Algorithms and Applications)

Categories
Linux

Freeing up space on Ubuntu

Clear journal and set up 10 day rollover:
journalctl --vacuum-time=10d

Clean the apt cache
du -sh /var/cache/apt/archives
apt-get clean

There's a program to visualise space:
apt-get install baobab

Turns out the other big things are

pytorch is 1.3gb

python 2.7 and ansible is 500mb

in var/lib, docker is 2.5gb and flatpak is 1.5gb

tensorflow is 450mb

I got rid of my old buckets code, got another 1.5GB back, by deleting docker completely, and reinstalling it.

https://askubuntu.com/questions/935569/how-to-completely-uninstall-docker

and how to install reinstall docker on ubuntu:

apt-get install docker-ce docker-ce-cli containerd.io

https://docs.docker.com/engine/install/ubuntu/

Got rid of flatpak :

flatpak remote-add flathub https://flathub.org/repo/flathub.flatpakrepo

flatpak uninstall –all

This uninstalled inkscape and something gtk related.

I also got rid of anything python 2 related.


sudo apt-get purge python 2.7