Categories
AI/ML sim2real simulation

ADR

Automatic domain randomization

arxiv: https://arxiv.org/pdf/1910.07113.pdf

Increases randomisation of parameters as training goes on: https://openai.com/blog/solving-rubiks-cube/

Based on top of the work by these French sim2real ppl: https://hal.inria.fr/tel-01974203/file/89722_GOLEMO_2018_archivage.pdf

Categories
AI/ML simulation

Diversity is all you need (DIAYN)

https://sites.google.com/view/diayn/
Unsupervised novelty
https://github.com/ben-eysenbach/sac/blob/master/DIAYN.md

Soft Actor Critic – https://towardsdatascience.com/soft-actor-critic-demystified-b8427df61665

arxiv: https://arxiv.org/abs/1801.01290 (Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor)

arxiv: https://arxiv.org/abs/1812.05905 (Soft Actor-Critic Algorithms and Applications)

Categories
Locomotion simulation

Obstacle course

I was thinking that when I get the RL or EA working on the robot, for locomotion, I can just put things in its way, to train it for climbing over obstacles.

It seems that swapping the 2D plane for noise generated terrain is a common first step towards training a more resilient robot in simulation.

Categories
AI/ML robots simulation

State-Dependent Exploration

https://arxiv.org/pdf/2005.05719.pdf

Categories
AI/ML dev simulation

Prioritized Experience Replay

While looking at the stable baselines docs, I came across PER: 
https://arxiv.org/pdf/1511.05952.pdf

I saw it in the parameters of DQN (https://openai.com/blog/openai-baselines-dqn/ )- It was OpenAI's 2017 algorithm. released when they released their baselines). 

model = DQN(‘MlpPolicy’, env, learning_rate=1e-3, prioritized_replay=True, verbose=1)

Categories
3D Research AI/ML robots simulation

DREAM (SRL)

9:52 / 55:39

State Representation Learning

https://s-rl-toolbox.readthedocs.io/en/latest/index.html
I came across this toolbox because with the robot, I am not sure what’s wrong, saving and loading policy files for ARS.

I asked about it here: https://pybullet.org/Bullet/phpBB3/viewtopic.php?f=24&t=13005

So I was considering using some “stable baseline” RL algorithms. They have an implementation of PPO, which is another recent algorithm.

Categories
AI/ML Locomotion robots sim2real simulation

Imitation Learning

This is the real ticket. Basically motion capture to speed up training. But when a robot can do this, we don’t need human workers anymore. (Except to provide examples of the actions to perform, and to build the first robot-building machine, or robot-building-building machines, etc.

videos: https://sites.google.com/view/nips2017-one-shot-imitation/home

arxiv: https://arxiv.org/pdf/1703.07326.pdf

abstract: https://arxiv.org/abs/1703.07326

Learning Agile Robotic Locomotion Skills by
Imitating Animals: https://xbpeng.github.io/projects/Robotic_Imitation/2020_Robotic_Imitation.pdf

Imitation is the ability to recognize and reproduce others’ actions – By extension, imitation learning is a means of learning and developing new skills from observing these skills performed by another agent. Imitation learning (IL) as applied to robots is a technique to reduce the complexity of search spaces for learning. When observing either good or bad examples, one can reduce the search for a possible solution, by either starting the search from the observed good solution (local optima), or conversely, by eliminating from the search space what is known as a bad solution. Imitation learning offers an implicit means of training a machine, such that explicit and tedious programming of a task by a human user can be minimized or eliminated. Imitation learning is thus a “natural” means of training a machine, meant to be accessible to lay people. – (https://link.springer.com/referenceworkentry/10.1007%2F978-1-4419-1428-6_758)

OpenAI’s https://openai.com/blog/robots-that-learn/

“We’ve created a robotics system, trained entirely in simulation and deployed on a physical robot, which can learn a new task after seeing it done once.”

Categories
control robots sim2real simulation

Sim2Real links

Using Simulation and Domain Adaptation to Improve
Efficiency of Deep Robotic Grasping:

https://arxiv.org/pdf/1805.07831.pdf

Optimizing Simulations with Noise-Tolerant Structured Exploration: https://arxiv.org/pdf/1805.07831.pdf

Categories
AI/ML simulation

Hindsight Experience Replay (HER)

Hindsight Experience Replay

(Instead of getting a crap result and saying, wow what a crap result, you say, ah well if that’s what I had wanted to do, then that would have worked.)

It’s a clever idea, Open AI.

One ability humans have, unlike the current generation of model-free RL algorithms, is to learn almost as much from achieving an undesired outcome as from the desired one.

https://arxiv.org/abs/1707.01495

“So why not just pretend that we wanted to achieve this goal to begin with, instead of the one that we set out to achieve originally?”

The HER algorithm achieves this by using what is called “sparse and binary” rewards, which only provide an indication to the agent that either it has failed or succeeded. In contrast, the “dense,” “shaped” rewards used in conventional reinforcement learning tip agents off as to whether they are getting “close,” “closer,” “much closer,” or “very close” to hitting their goal. Such so-called dense rewards can speed up the learning process, but the drawback is that these dense rewards often don’t contain much of a learning signal for the agent to learn from, and can be difficult to design and implement for real-world applications.

https://thenewstack.io/openai-algorithm-allows-ai-to-learn-from-its-mistakes/

G-HER

Hindsight experience replay (HER) based on universal value functions shows promising results in such multi-goal settings by substituting achieved goals for the original goal, frequently giving the agent rewards. However, the achieved goals are limited to the current policy level and lack guidance for learning. We propose a novel guided goal-generation model for multi-goal RL named G-HER. Our method uses a conditional generative recurrent neural network (RNN) to explicitly model the relationship between policy level and goals, enabling the generation of various goals conditions on the different policy levels.

https://sites.google.com/view/gher-algorithm

https://thenewstack.io/openai-algorithm-allows-ai-to-learn-from-its-mistakes/

Categories
dev robots simulation

Catkin

ROS build tool. These are the patterns of use:

In order to help automate the merged build process, Catkin was distributed with a command-line tool called catkin_make. This command automated the above CMake work flow while setting some variables according to standard conventions. These defaults would result in the execution of the following commands:

$ mkdir build
$ cd build
$ cmake ../src -DCATKIN_DEVEL_SPACE=../devel -DCMAKE_INSTALL_PREFIX=../install
$ make -j<number of cores> -l<number of cores> [optional target, e.g. install]

To get DSO (Direct Sparse Odometry) working.

I followed these instructions: https://github.com/JakobEngel/dso_ros/issues/32

I made /opt/catkin_ws

git clone –single-branch –branch cmake https://github.com/NikolausDemmel/dso.git
git clone –single-branch –branch catkin https://github.com/NikolausDemmel/dso_ros.git

catkin init

catkin config -DCMAKE_BUILD_TYPE=Release

catkin build