Categories
AI/ML deep dev Locomotion

Ray

https://docs.ray.io/en/master/rllib-algorithms.html Seems like this might be the most up to date baselines repo.

Ray is a fast and simple framework for building and running distributed applications.

Ray is packaged with the following libraries for accelerating machine learning workloads:

  • Tune: Scalable Hyperparameter Tuning
  • RLlib: Scalable Reinforcement Learning
  • RaySGD: Distributed Training Wrappers

ARS implementation: https://github.com/ray-project/ray/blob/master/rllib/agents/ars/ars.py

Categories
AI/ML sim2real simulation

ADR

Automatic domain randomization

arxiv: https://arxiv.org/pdf/1910.07113.pdf

Increases randomisation of parameters as training goes on: https://openai.com/blog/solving-rubiks-cube/

Based on top of the work by these French sim2real ppl: https://hal.inria.fr/tel-01974203/file/89722_GOLEMO_2018_archivage.pdf

Categories
AI/ML institutes

DeepMind

Somehow just been following OpenAI and missed all the action at the other big algorithm R&D company. https://deepmind.com/research

Experience Replay: https://deepmind.com/research/open-source/Reverb

https://deepmind.com/research/open-source/Acme_os Their new framework; https://github.com/deepmind/acme

https://github.com/deepmind/dm_control – seems they’re a Mujoco house.

Categories
AI/ML simulation

Diversity is all you need (DIAYN)

https://sites.google.com/view/diayn/
Unsupervised novelty
https://github.com/ben-eysenbach/sac/blob/master/DIAYN.md

Soft Actor Critic – https://towardsdatascience.com/soft-actor-critic-demystified-b8427df61665

arxiv: https://arxiv.org/abs/1801.01290 (Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor)

arxiv: https://arxiv.org/abs/1812.05905 (Soft Actor-Critic Algorithms and Applications)

Categories
AI/ML robots simulation

State-Dependent Exploration

https://arxiv.org/pdf/2005.05719.pdf

Categories
AI/ML

Continual learning

So the whole issue that made me try get PPO working, and give up on ARS for a bit, is that I’m having trouble saving the policy to file, and then loading it back up.

https://stable-baselines.readthedocs.io/en/master/guide/examples.html#continual-learning

The current problem with the PPO version is that it’s just falling over in the reward direction.

Categories
AI/ML dev simulation

Prioritized Experience Replay

While looking at the stable baselines docs, I came across PER: 
https://arxiv.org/pdf/1511.05952.pdf

I saw it in the parameters of DQN (https://openai.com/blog/openai-baselines-dqn/ )- It was OpenAI's 2017 algorithm. released when they released their baselines). 

model = DQN(‘MlpPolicy’, env, learning_rate=1e-3, prioritized_replay=True, verbose=1)

Categories
3D Research AI/ML robots simulation

DREAM (SRL)

9:52 / 55:39

State Representation Learning

https://s-rl-toolbox.readthedocs.io/en/latest/index.html
I came across this toolbox because with the robot, I am not sure what’s wrong, saving and loading policy files for ARS.

I asked about it here: https://pybullet.org/Bullet/phpBB3/viewtopic.php?f=24&t=13005

So I was considering using some “stable baseline” RL algorithms. They have an implementation of PPO, which is another recent algorithm.

Categories
AI/ML

Continual learning

Carrying on from training in one domain, to training in another. https://www.continualai.org/ https://gist.github.com/araffin/a95dfd1accec437799f2e1e0370a1539

Wiki: https://wiki.continualai.org/
https://www.sciencedirect.com/science/article/pii/S1566253519307377
https://www.sciencedirect.com/science/article/pii/S0893608019300231
https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16410
https://arxiv.org/abs/1810.13166

Categories
AI/ML deep

Dark Knowledge

arxiv: https://arxiv.org/pdf/1503.02531.pdf

Slides: https://www.ttic.edu/dl/dark14.pdf

What is George Hinton’s Dark Knowledge?

Distilling the knowledge in an ensemble of models into a single model.

It was based on the ‘model compression’ paper of Rich Caruana http://www.cs.cornell.edu/~caruana/
http://www.cs.cornell.edu/~caruana/compression.kdd06.pdf

There is a distinction between hard and soft targets, when you train a smaller network to get the same results as a bigger network… If you train the smaller network based on a cost function minimizing the difference from the original larger network’s results, you lose some knowledge that was encoded in the ‘softer targets’. By changing the softmax function at the end of the classification network, it’s possible to take into account how likely a class is to be mistaken for the other classes.