Categories
The Sentient Table

Hyperparameters and Rewards

I got the table walking with ARS, but pybullet saving just the perceptron weights didn’t seem to reload progress.

So I switched to PPO, which is a bit more complicated. Stable baselines PPO1 and PPO2 converged too easily, with the table opting to fall over all the time.

So I started editing with the reward function weights, changing it from weighing X axis movement by 1, and weighing Z axis movement by 0.5, to the opposite. So standing up is more important now. I also penalised falling over by a constant value. It’s not looking particularly smart after 11 rounds, but it’s not falling over forward anymore, at least. Yet.

I also changed some PPO hyperparams:

clip_param=0.4, entcoeff=0.2, timesteps_per_actorbatch=1024, 

Basically more exploration than before by allowing more variation in policy changes, and increasing some sort of entropy can’t hurt right? and giving it more time to evaluate per batch, as maybe falling over was as good as you could hope for, in a smaller batch.

This is a good summary of the state of the art in hyperparam tuning. I’ll probably need to do this soon. https://medium.com/criteo-labs/hyper-parameter-optimization-algorithms-2fe447525903

Combine PPO with NES to Improve Exploration https://arxiv.org/pdf/1905.09492.pdf

PBT https://arxiv.org/abs/1711.09846

https://deepmind.com/blog/article/population-based-training-neural-networks

Policy Optimization with Model-based Explorations https://arxiv.org/pdf/1811.07350.pdf

It seems like my fiddling with hyperparams caused ‘kl’

Karhunen-Loeve (KL) to go to NaN. I dunno.

Something about stochastic ‘eigenvector transform’. Similar to Fourier transform for sound, apparently.

So I need to tune hyperparams.

The Stable baselines allow you to change params https://stable-baselines.readthedocs.io/en/master/guide/examples.html#accessing-and-modifying-model-parameters

So kl becoming NaN that could mean i’m returning a zero somewhere from the model.

“In my case, adding 1e-8 to the divisor made the trick… ” – https://github.com/modestyachts/ARS/issues/1

https://stable-baselines.readthedocs.io/en/master/guide/checking_nan.html

Categories
Locomotion The Sentient Table

RL Toolboxes

stable baselines – https://stable-baselines.readthedocs.io/en/master/

Ray / RLLib – https://docs.ray.io/en/master/

OpenAI Spinning Up – https://spinningup.openai.com/en/latest/spinningup/keypapers.html

RLKit – https://github.com/vitchyr/rlkit

Garage – https://github.com/rlworkgroup/garage

Categories
AI/ML simulation

Diversity is all you need (DIAYN)

https://sites.google.com/view/diayn/
Unsupervised novelty
https://github.com/ben-eysenbach/sac/blob/master/DIAYN.md

Soft Actor Critic – https://towardsdatascience.com/soft-actor-critic-demystified-b8427df61665

arxiv: https://arxiv.org/abs/1801.01290 (Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor)

arxiv: https://arxiv.org/abs/1812.05905 (Soft Actor-Critic Algorithms and Applications)

Categories
Linux

Freeing up space on Ubuntu

Clear journal and set up 10 day rollover:
journalctl --vacuum-time=10d

Clean the apt cache
du -sh /var/cache/apt/archives
apt-get clean

There's a program to visualise space:
apt-get install baobab

Turns out the other big things are

pytorch is 1.3gb

python 2.7 and ansible is 500mb

in var/lib, docker is 2.5gb and flatpak is 1.5gb

tensorflow is 450mb

I got rid of my old buckets code, got another 1.5GB back, by deleting docker completely, and reinstalling it.

https://askubuntu.com/questions/935569/how-to-completely-uninstall-docker

and how to install reinstall docker on ubuntu:

apt-get install docker-ce docker-ce-cli containerd.io

https://docs.docker.com/engine/install/ubuntu/

Got rid of flatpak :

flatpak remote-add flathub https://flathub.org/repo/flathub.flatpakrepo

flatpak uninstall –all

This uninstalled inkscape and something gtk related.

I also got rid of anything python 2 related.


sudo apt-get purge python 2.7

Categories
Locomotion simulation

Obstacle course

I was thinking that when I get the RL or EA working on the robot, for locomotion, I can just put things in its way, to train it for climbing over obstacles.

It seems that swapping the 2D plane for noise generated terrain is a common first step towards training a more resilient robot in simulation.

Categories
AI/ML robots simulation

State-Dependent Exploration

https://arxiv.org/pdf/2005.05719.pdf

Categories
dev GANs sim2real

GAN SimToReal

https://github.com/ugurkanates/awesome-real-world-rl#simulation-to-real-with-gans

GraspGAN: https://arxiv.org/pdf/1709.07857.pdf

RL-CycleGAN https://arxiv.org/pdf/2006.09001.pdf

And https://sim2realai.github.io/Quantifying-Transferability/ this whole website is interesting

Categories
AI/ML

Continual learning

So the whole issue that made me try get PPO working, and give up on ARS for a bit, is that I’m having trouble saving the policy to file, and then loading it back up.

https://stable-baselines.readthedocs.io/en/master/guide/examples.html#continual-learning

The current problem with the PPO version is that it’s just falling over in the reward direction.

Categories
AI/ML dev simulation

Prioritized Experience Replay

While looking at the stable baselines docs, I came across PER: 
https://arxiv.org/pdf/1511.05952.pdf

I saw it in the parameters of DQN (https://openai.com/blog/openai-baselines-dqn/ )- It was OpenAI's 2017 algorithm. released when they released their baselines). 

model = DQN(‘MlpPolicy’, env, learning_rate=1e-3, prioritized_replay=True, verbose=1)

Categories
Math

Vaseršteĭn metric

Related to Transportation theory

https://en.wikipedia.org/wiki/Transportation_theory_(mathematics) https://en.wikipedia.org/wiki/Wasserstein_metric

Something like factorio, where you want to optimise logistics. Found it here, https://github.com/matthieuheitz/WassersteinDictionaryLearning which I came across while looking for ways to visualise npy files.

In mathematics, the Wasserstein or Kantorovich–Rubinstein metric or distance is a distance function defined between probability distributions on a given metric space {\displaystyle M}M.

Intuitively, if each distribution is viewed as a unit amount of “dirt” piled on {M}, the metric is the minimum “cost” of turning one pile into the other, which is assumed to be the amount of dirt that needs to be moved times the mean distance it has to be moved. Because of this analogy, the metric is known in computer science as the earth mover’s distance.

https://arxiv.org/pdf/1708.01955.pdf is full of statistical optimisation jargon, related to this https://en.wikipedia.org/wiki/Sparse_dictionary_learning which mentions Stochastic gradient descent as a type. So it’s like sampling something and generalising a function.

“Sparse coding is a representation learning method which aims at finding a sparse representation of the input data in the form of a linear combination of basic elements as well as those basic elements themselves. These elements are called atoms and they compose a dictionary. “

https://en.wikipedia.org/wiki/Duality_(optimization) In mathematical optimization theory, duality or the duality principle is the principle that optimization problems may be viewed from either of two perspectives, the primal problem or the dual problem. The solution to the dual problem provides a lower bound to the solution of the primal (minimization) problem.[1] However in general the optimal values of the primal and dual problems need not be equal. Their difference is called the duality gap. For convex optimization problems, the duality gap is zero under a constraint qualification condition.

For the npy files I came across his https://github.com/matthieuheitz/npy_viewer which had a couple cool programs.