Category: AI/ML

Ray

Post author By DJ Bro Bot
Post date July 12, 2020

https://docs.ray.io/en/master/rllib-algorithms.html Seems like this might be the most up to date baselines repo.

Ray is a fast and simple framework for building and running distributed applications.

Ray is packaged with the following libraries for accelerating machine learning workloads:

Tune: Scalable Hyperparameter Tuning
RLlib: Scalable Reinforcement Learning
RaySGD: Distributed Training Wrappers

ARS implementation: https://github.com/ray-project/ray/blob/master/rllib/agents/ars/ars.py

AI/ML sim2real simulation

ADR

Post author By DJ Bro Bot
Post date July 11, 2020

Automatic domain randomization

arxiv: https://arxiv.org/pdf/1910.07113.pdf

Increases randomisation of parameters as training goes on: https://openai.com/blog/solving-rubiks-cube/

Based on top of the work by these French sim2real ppl: https://hal.inria.fr/tel-01974203/file/89722_GOLEMO_2018_archivage.pdf

AI/ML institutes

DeepMind

Post author By DJ Bro Bot
Post date July 10, 2020

Somehow just been following OpenAI and missed all the action at the other big algorithm R&D company. https://deepmind.com/research

Experience Replay: https://deepmind.com/research/open-source/Reverb

https://deepmind.com/research/open-source/Acme_os Their new framework; https://github.com/deepmind/acme

https://github.com/deepmind/dm_control – seems they’re a Mujoco house.

AI/ML simulation

Diversity is all you need (DIAYN)

Post author By DJ Bro Bot
Post date July 6, 2020

https://sites.google.com/view/diayn/
Unsupervised novelty
https://github.com/ben-eysenbach/sac/blob/master/DIAYN.md

Soft Actor Critic – https://towardsdatascience.com/soft-actor-critic-demystified-b8427df61665

arxiv: https://arxiv.org/abs/1801.01290 (Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor)

arxiv: https://arxiv.org/abs/1812.05905 (Soft Actor-Critic Algorithms and Applications)

AI/ML robots simulation

State-Dependent Exploration

Post author By DJ Bro Bot
Post date July 5, 2020

https://arxiv.org/pdf/2005.05719.pdf

AI/ML

Continual learning

Post author By DJ Bro Bot
Post date July 5, 2020

So the whole issue that made me try get PPO working, and give up on ARS for a bit, is that I’m having trouble saving the policy to file, and then loading it back up.

https://stable-baselines.readthedocs.io/en/master/guide/examples.html#continual-learning

The current problem with the PPO version is that it’s just falling over in the reward direction.

AI/ML dev simulation

Prioritized Experience Replay

Post author By DJ Bro Bot
Post date July 5, 2020

While looking at the stable baselines docs, I came across PER: 
https://arxiv.org/pdf/1511.05952.pdf

I saw it in the parameters of DQN (https://openai.com/blog/openai-baselines-dqn/ )- It was OpenAI's 2017 algorithm. released when they released their baselines).

model = DQN(‘MlpPolicy’, env, learning_rate=1e-3, prioritized_replay=True, verbose=1)

3D Research AI/ML robots simulation

DREAM (SRL)

Post author By DJ Bro Bot
Post date July 2, 2020

9:52 / 55:39

State Representation Learning

https://s-rl-toolbox.readthedocs.io/en/latest/index.html
I came across this toolbox because with the robot, I am not sure what’s wrong, saving and loading policy files for ARS.

I asked about it here: https://pybullet.org/Bullet/phpBB3/viewtopic.php?f=24&t=13005

So I was considering using some “stable baseline” RL algorithms. They have an implementation of PPO, which is another recent algorithm.

AI/ML

Continual learning

Post author By DJ Bro Bot
Post date June 25, 2020

Carrying on from training in one domain, to training in another. https://www.continualai.org/ https://gist.github.com/araffin/a95dfd1accec437799f2e1e0370a1539

Wiki: https://wiki.continualai.org/
https://www.sciencedirect.com/science/article/pii/S1566253519307377
https://www.sciencedirect.com/science/article/pii/S0893608019300231
https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16410
https://arxiv.org/abs/1810.13166

AI/ML deep

Dark Knowledge

Post author By DJ Bro Bot
Post date June 24, 2020

arxiv: https://arxiv.org/pdf/1503.02531.pdf

Slides: https://www.ttic.edu/dl/dark14.pdf

What is George Hinton’s Dark Knowledge?

Distilling the knowledge in an ensemble of models into a single model.

It was based on the ‘model compression’ paper of Rich Caruana http://www.cs.cornell.edu/~caruana/
http://www.cs.cornell.edu/~caruana/compression.kdd06.pdf

There is a distinction between hard and soft targets, when you train a smaller network to get the same results as a bigger network… If you train the smaller network based on a cost function minimizing the difference from the original larger network’s results, you lose some knowledge that was encoded in the ‘softer targets’. By changing the softmax function at the end of the classification network, it’s possible to take into account how likely a class is to be mistaken for the other classes.