Categories
AI/ML meta

Meta-learning & MAML

For like, having fall-back plans when things go wrong. Or like, phasing between policies, so you don’t “drop the ball”

https://arxiv.org/abs/1703.03400

Reminds me of Map-Elites, in that it collects behaviours.

“We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning problems, including classification, regression, and reinforcement learning. The goal of meta-learning is to train a model on a variety of learning tasks, such that it can solve new learning tasks using only a small number of training samples. In our approach, the parameters of the model are explicitly trained such that a small number of gradient steps with a small amount of training data from a new task will produce good generalization performance on that task. In effect, our method trains the model to be easy to fine-tune. We demonstrate that this approach leads to state-of-the-art performance on two few-shot image classification benchmarks, produces good results on few-shot regression, and accelerates fine-tuning for policy gradient reinforcement learning with neural network policies.”

Mostly based around the ES algorithm, they got a robot to walk straight again soon after hobbling it. https://ai.googleblog.com/2020/04/exploring-evolutionary-meta-learning-in.html

https://arxiv.org/pdf/2003.01239.pdf

“we present an evolutionary meta-learning algorithm
that enables locomotion policies to quickly adapt in noisy
real world scenarios. The core idea is to develop an efficient
and noise-tolerant adaptation operator, and integrate it into
meta-learning frameworks. We have shown that this Batch
Hill-Climbing operator works better in handling noise than
simply averaging rewards over multiple runs. Our algorithm
has achieved greater adaptation performance than the stateof-the-art MAML algorithms that are based on policy gradient. Finally, we validate our method on a real quadruped
robot. Trained in simulation, the locomotion policies can
successfully adapt to two real-world robot environments,
whose dynamics have been drastically changed.

In the future, we plan to extend our method in several
ways. First, we believe that we can replace the Gaussian
perturbations in the evolutionary algorithm with non-isotropic
samples to further improve the sample efficiency during
adaptation. With less robot data required for adaptation, we
plan to develop a lifelong learning system, in which the
robot can continuously collect data and quickly adjust its
policy to learn new skills and to operate optimally in new
environments
.”