Category: simulation

ADR

Post author By DJ Bro Bot
Post date July 11, 2020

Automatic domain randomization

arxiv: https://arxiv.org/pdf/1910.07113.pdf

Increases randomisation of parameters as training goes on: https://openai.com/blog/solving-rubiks-cube/

Based on top of the work by these French sim2real ppl: https://hal.inria.fr/tel-01974203/file/89722_GOLEMO_2018_archivage.pdf

AI/ML simulation

Diversity is all you need (DIAYN)

Post author By DJ Bro Bot
Post date July 6, 2020

https://sites.google.com/view/diayn/
Unsupervised novelty
https://github.com/ben-eysenbach/sac/blob/master/DIAYN.md

Soft Actor Critic – https://towardsdatascience.com/soft-actor-critic-demystified-b8427df61665

arxiv: https://arxiv.org/abs/1801.01290 (Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor)

arxiv: https://arxiv.org/abs/1812.05905 (Soft Actor-Critic Algorithms and Applications)

Locomotion simulation

Obstacle course

Post author By DJ Bro Bot
Post date July 6, 2020

I was thinking that when I get the RL or EA working on the robot, for locomotion, I can just put things in its way, to train it for climbing over obstacles.

It seems that swapping the 2D plane for noise generated terrain is a common first step towards training a more resilient robot in simulation.

AI/ML robots simulation

State-Dependent Exploration

Post author By DJ Bro Bot
Post date July 5, 2020

https://arxiv.org/pdf/2005.05719.pdf

AI/ML dev simulation

Prioritized Experience Replay

Post author By DJ Bro Bot
Post date July 5, 2020

While looking at the stable baselines docs, I came across PER: 
https://arxiv.org/pdf/1511.05952.pdf

I saw it in the parameters of DQN (https://openai.com/blog/openai-baselines-dqn/ )- It was OpenAI's 2017 algorithm. released when they released their baselines).

model = DQN(‘MlpPolicy’, env, learning_rate=1e-3, prioritized_replay=True, verbose=1)

3D Research AI/ML robots simulation

DREAM (SRL)

Post author By DJ Bro Bot
Post date July 2, 2020

9:52 / 55:39

State Representation Learning

https://s-rl-toolbox.readthedocs.io/en/latest/index.html
I came across this toolbox because with the robot, I am not sure what’s wrong, saving and loading policy files for ARS.

I asked about it here: https://pybullet.org/Bullet/phpBB3/viewtopic.php?f=24&t=13005

So I was considering using some “stable baseline” RL algorithms. They have an implementation of PPO, which is another recent algorithm.

AI/ML Locomotion robots sim2real simulation

Imitation Learning

Post author By DJ Bro Bot
Post date June 21, 2020

This is the real ticket. Basically motion capture to speed up training. But when a robot can do this, we don’t need human workers anymore. (Except to provide examples of the actions to perform, and to build the first robot-building machine, or robot-building-building machines, etc.

videos: https://sites.google.com/view/nips2017-one-shot-imitation/home

arxiv: https://arxiv.org/pdf/1703.07326.pdf

abstract: https://arxiv.org/abs/1703.07326

Learning Agile Robotic Locomotion Skills by
Imitating Animals: https://xbpeng.github.io/projects/Robotic_Imitation/2020_Robotic_Imitation.pdf

Imitation is the ability to recognize and reproduce others’ actions – By extension, imitation learning is a means of learning and developing new skills from observing these skills performed by another agent. Imitation learning (IL) as applied to robots is a technique to reduce the complexity of search spaces for learning. When observing either good or bad examples, one can reduce the search for a possible solution, by either starting the search from the observed good solution (local optima), or conversely, by eliminating from the search space what is known as a bad solution. Imitation learning offers an implicit means of training a machine, such that explicit and tedious programming of a task by a human user can be minimized or eliminated. Imitation learning is thus a “natural” means of training a machine, meant to be accessible to lay people. – (https://link.springer.com/referenceworkentry/10.1007%2F978-1-4419-1428-6_758)

OpenAI’s https://openai.com/blog/robots-that-learn/

“We’ve created a robotics system, trained entirely in simulation and deployed on a physical robot, which can learn a new task after seeing it done once.”

control robots sim2real simulation

Sim2Real links

Post author By DJ Bro Bot
Post date May 1, 2020

Using Simulation and Domain Adaptation to Improve
Efficiency of Deep Robotic Grasping:

https://arxiv.org/pdf/1805.07831.pdf

Optimizing Simulations with Noise-Tolerant Structured Exploration: https://arxiv.org/pdf/1805.07831.pdf

AI/ML simulation

Hindsight Experience Replay (HER)

Post author By DJ Bro Bot
Post date April 28, 2020

Hindsight Experience Replay

(Instead of getting a crap result and saying, wow what a crap result, you say, ah well if that’s what I had wanted to do, then that would have worked.)

It’s a clever idea, Open AI.

One ability humans have, unlike the current generation of model-free RL algorithms, is to learn almost as much from achieving an undesired outcome as from the desired one.
https://arxiv.org/abs/1707.01495

“So why not just pretend that we wanted to achieve this goal to begin with, instead of the one that we set out to achieve originally?”
The HER algorithm achieves this by using what is called “sparse and binary” rewards, which only provide an indication to the agent that either it has failed or succeeded. In contrast, the “dense,” “shaped” rewards used in conventional reinforcement learning tip agents off as to whether they are getting “close,” “closer,” “much closer,” or “very close” to hitting their goal. Such so-called dense rewards can speed up the learning process, but the drawback is that these dense rewards often don’t contain much of a learning signal for the agent to learn from, and can be difficult to design and implement for real-world applications.
https://thenewstack.io/openai-algorithm-allows-ai-to-learn-from-its-mistakes/

G-HER

Hindsight experience replay (HER) based on universal value functions shows promising results in such multi-goal settings by substituting achieved goals for the original goal, frequently giving the agent rewards. However, the achieved goals are limited to the current policy level and lack guidance for learning. We propose a novel guided goal-generation model for multi-goal RL named G-HER. Our method uses a conditional generative recurrent neural network (RNN) to explicitly model the relationship between policy level and goals, enabling the generation of various goals conditions on the different policy levels.

https://sites.google.com/view/gher-algorithm

https://thenewstack.io/openai-algorithm-allows-ai-to-learn-from-its-mistakes/

dev robots simulation

Catkin

Post author By DJ Bro Bot
Post date April 24, 2020

ROS build tool. These are the patterns of use:

In order to help automate the merged build process, Catkin was distributed with a command-line tool called catkin_make. This command automated the above CMake work flow while setting some variables according to standard conventions. These defaults would result in the execution of the following commands:

$ mkdir build
$ cd build
$ cmake ../src -DCATKIN_DEVEL_SPACE=../devel -DCMAKE_INSTALL_PREFIX=../install
$ make -j<number of cores> -l<number of cores> [optional target, e.g. install]

To get DSO (Direct Sparse Odometry) working.

I followed these instructions: https://github.com/JakobEngel/dso_ros/issues/32

I made /opt/catkin_ws

git clone –single-branch –branch cmake https://github.com/NikolausDemmel/dso.git
git clone –single-branch –branch catkin https://github.com/NikolausDemmel/dso_ros.git

catkin init

catkin config -DCMAKE_BUILD_TYPE=Release

catkin build