Categories
AI/ML meta

Meta-learning & MAML

For like, having fall-back plans when things go wrong. Or like, phasing between policies, so you don’t “drop the ball”

https://arxiv.org/abs/1703.03400

Reminds me of Map-Elites, in that it collects behaviours.

“We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning problems, including classification, regression, and reinforcement learning. The goal of meta-learning is to train a model on a variety of learning tasks, such that it can solve new learning tasks using only a small number of training samples. In our approach, the parameters of the model are explicitly trained such that a small number of gradient steps with a small amount of training data from a new task will produce good generalization performance on that task. In effect, our method trains the model to be easy to fine-tune. We demonstrate that this approach leads to state-of-the-art performance on two few-shot image classification benchmarks, produces good results on few-shot regression, and accelerates fine-tuning for policy gradient reinforcement learning with neural network policies.”

Mostly based around the ES algorithm, they got a robot to walk straight again soon after hobbling it. https://ai.googleblog.com/2020/04/exploring-evolutionary-meta-learning-in.html

https://arxiv.org/pdf/2003.01239.pdf

“we present an evolutionary meta-learning algorithm
that enables locomotion policies to quickly adapt in noisy
real world scenarios. The core idea is to develop an efficient
and noise-tolerant adaptation operator, and integrate it into
meta-learning frameworks. We have shown that this Batch
Hill-Climbing operator works better in handling noise than
simply averaging rewards over multiple runs. Our algorithm
has achieved greater adaptation performance than the stateof-the-art MAML algorithms that are based on policy gradient. Finally, we validate our method on a real quadruped
robot. Trained in simulation, the locomotion policies can
successfully adapt to two real-world robot environments,
whose dynamics have been drastically changed.

In the future, we plan to extend our method in several
ways. First, we believe that we can replace the Gaussian
perturbations in the evolutionary algorithm with non-isotropic
samples to further improve the sample efficiency during
adaptation. With less robot data required for adaptation, we
plan to develop a lifelong learning system, in which the
robot can continuously collect data and quickly adjust its
policy to learn new skills and to operate optimally in new
environments
.”

Categories
AI/ML Locomotion robots sim2real simulation

Imitation Learning

This is the real ticket. Basically motion capture to speed up training. But when a robot can do this, we don’t need human workers anymore. (Except to provide examples of the actions to perform, and to build the first robot-building machine, or robot-building-building machines, etc.

videos: https://sites.google.com/view/nips2017-one-shot-imitation/home

arxiv: https://arxiv.org/pdf/1703.07326.pdf

abstract: https://arxiv.org/abs/1703.07326

Learning Agile Robotic Locomotion Skills by
Imitating Animals: https://xbpeng.github.io/projects/Robotic_Imitation/2020_Robotic_Imitation.pdf

Imitation is the ability to recognize and reproduce others’ actions – By extension, imitation learning is a means of learning and developing new skills from observing these skills performed by another agent. Imitation learning (IL) as applied to robots is a technique to reduce the complexity of search spaces for learning. When observing either good or bad examples, one can reduce the search for a possible solution, by either starting the search from the observed good solution (local optima), or conversely, by eliminating from the search space what is known as a bad solution. Imitation learning offers an implicit means of training a machine, such that explicit and tedious programming of a task by a human user can be minimized or eliminated. Imitation learning is thus a “natural” means of training a machine, meant to be accessible to lay people. – (https://link.springer.com/referenceworkentry/10.1007%2F978-1-4419-1428-6_758)

OpenAI’s https://openai.com/blog/robots-that-learn/

“We’ve created a robotics system, trained entirely in simulation and deployed on a physical robot, which can learn a new task after seeing it done once.”

Categories
AI/ML arxiv GANs

GANs in Keras

Came across this guy’s project

https://github.com/germain-hug/GANs-Keras

Mentioned some papers on GANs. Interesting for overview of related algorithms.

https://arxiv.org/abs/1511.06434 – Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

https://arxiv.org/abs/1701.07875 – Wasserstein GAN

https://arxiv.org/abs/1411.1784 – Conditional Generative Adversarial Nets

https://arxiv.org/abs/1606.03657 – InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

Categories
AI/ML evolution robots

resibots

another phd collab thing, on European Research Council grant https://www.resibots.eu/videos.html 2015-2020. Nice. They’re the ones who developed MAP-elites https://arxiv.org/abs/1504.04909

They https://members.loria.fr/JBMouret/nature_press.html had a paper published in Nature, for their bots that fix themselves.

MAP-Elites is interesting. It categorises behaviours and tests local optima, of some sort variables. Haven’t read the paper yet. It is windy.

“It creates a map of high-performing solutions at each point in a space defined by dimensions of variation that a user gets to choose. This Multi-dimensional Archive of Phenotypic Elites (MAP-Elites) algorithm illuminates search spaces, allowing researchers to understand how interesting attributes of solutions combine to affect performance, either positively or, equally of interest, negatively. “

Categories
AI/ML deep highly_speculative

Cat Papers

Someone collected all the cat related AI papers: https://github.com/junyanz/CatPapers http://people.csail.mit.edu/junyanz/cat/cat_papers.html

Categories
AI/ML CNNs deep dev institutes

wandb

https://app.wandb.ai/gabesmed/examples-tf-estimator-mnist/runs/98nmh0vy/tensorboard?workspace=user-

hope that works. It’s that guy on youtube who says ‘dear scholars’ and ‘what a time to be alive’.

Advertising was: Lambda GPU clouds, $20 for imagenet training, no setup required. Good to know.

looks like a nice UI for stuff : https://www.wandb.com/articles

Categories
AI/ML Math

Autograd

The ‘magic’ underlying PyTorch https://towardsdatascience.com/pytorch-autograd-understanding-the-heart-of-pytorchs-magic-2686cd94ec95

:That is true. As I wrote earlier, PyTorch is a jacobian-vector product engine. In the process it never explicitly constructs the whole Jacobian. It’s usually simpler and more efficient to compute the JVP directly.:

Source: https://www.cs.toronto.edu/~rgrosse/courses/csc321_2018/slides/lec10.pdf

jacobian-vector products?

Categories
AI/ML simulation

Hindsight Experience Replay (HER)

Hindsight Experience Replay

(Instead of getting a crap result and saying, wow what a crap result, you say, ah well if that’s what I had wanted to do, then that would have worked.)

It’s a clever idea, Open AI.

One ability humans have, unlike the current generation of model-free RL algorithms, is to learn almost as much from achieving an undesired outcome as from the desired one.

https://arxiv.org/abs/1707.01495

“So why not just pretend that we wanted to achieve this goal to begin with, instead of the one that we set out to achieve originally?”

The HER algorithm achieves this by using what is called “sparse and binary” rewards, which only provide an indication to the agent that either it has failed or succeeded. In contrast, the “dense,” “shaped” rewards used in conventional reinforcement learning tip agents off as to whether they are getting “close,” “closer,” “much closer,” or “very close” to hitting their goal. Such so-called dense rewards can speed up the learning process, but the drawback is that these dense rewards often don’t contain much of a learning signal for the agent to learn from, and can be difficult to design and implement for real-world applications.

https://thenewstack.io/openai-algorithm-allows-ai-to-learn-from-its-mistakes/

G-HER

Hindsight experience replay (HER) based on universal value functions shows promising results in such multi-goal settings by substituting achieved goals for the original goal, frequently giving the agent rewards. However, the achieved goals are limited to the current policy level and lack guidance for learning. We propose a novel guided goal-generation model for multi-goal RL named G-HER. Our method uses a conditional generative recurrent neural network (RNN) to explicitly model the relationship between policy level and goals, enabling the generation of various goals conditions on the different policy levels.

https://sites.google.com/view/gher-algorithm

https://thenewstack.io/openai-algorithm-allows-ai-to-learn-from-its-mistakes/

Categories
AI/ML control Locomotion simulation The Sentient Table

ARS and PPO

https://pybullet.org/Bullet/phpBB3/viewtopic.php?t=12553

A couple of advanced data science algorithms. Implemented both for the walking table. ARS is great. We hear a lot about deep learning. This one is shallow learning, and does very well on simpler tasks. Just inputs and outputs, no hidden layers.

It’s similar to the Evolution Strategies algorithm. Generally trying some random stuff out, and slowly changing the model based on what gets you closer to the goal.

ARS: https://arxiv.org/pdf/1803.07055.pdf

Good lecture slides http://eddiesagra.com/wp-content/uploads/2019/03/Introduction-to-Machine-Learning-v1.2-Mar-11-2019.pdf

ARS – Augmented Random Search

https://github.com/colinskow/move37/blob/master/ars/ars.py

https://towardsdatascience.com/introduction-to-augmented-random-search-d8d7b55309bd

PPO – Proximal Policy Optimization

https://github.com/bulletphysics/bullet3/blob/master/examples/pybullet/gym/pybullet_envs/agents/ppo/algorithm.py

Categories
AI/ML arxiv Vision

Instance Segmentation

https://arxiv.org/pdf/2003.10152.pdf – SOLOv2

https://arxiv.org/pdf/2003.06148.pdf – PointINS: Point-based Instance Segmentation

cool site, paperswithcode.

https://paperswithcode.com/task/instance-segmentation?page=4