Categories
3D Research AI/ML robots simulation

DREAM (SRL)

9:52 / 55:39

State Representation Learning

https://s-rl-toolbox.readthedocs.io/en/latest/index.html
I came across this toolbox because with the robot, I am not sure what’s wrong, saving and loading policy files for ARS.

I asked about it here: https://pybullet.org/Bullet/phpBB3/viewtopic.php?f=24&t=13005

So I was considering using some “stable baseline” RL algorithms. They have an implementation of PPO, which is another recent algorithm.

Categories
dev Locomotion The Sentient Table

Stable baselines

Need something to compare results to.

To install,

https://stable-baselines.readthedocs.io/en/master/guide/install.html

pip install git+https://github.com/hill-a/stable-baselines

Successfully installed absl-py-0.9.0 astor-0.8.1 gast-0.2.2 google-pasta-0.2.0 grpcio-1.30.0 h5py-2.10.0 keras-applications-1.0.8 keras-preprocessing-1.1.2 opt-einsum-3.2.1 stable-baselines-2.10.1a1 tensorboard-1.15.0 tensorflow-1.15.3 tensorflow-estimator-1.15.1 termcolor-1.1.0 wrapt-1.12.1

i’d originally done

pip install stable-baselines[mpi]

but the github installs dependencies too.

ok so pybullet comes with an ‘enjoy’ program which

~/.local/lib/python3.6/site-packages/pybullet_envs/stable_baselines

You can run it using:

python3 -m pybullet_envs.stable_baselines.enjoy –algo td3 –env HalfCheetahBulletEnv-v0

Ok I set up ppo2 and tried to run python3 ppo2.py

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/gym/envs/registration.py", line 106, in spec
    importlib.import_module(mod_name)
  File "/usr/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 953, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'gym-robotable'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "ppo2.py", line 29, in <module>
    env = gym.make(hp.env_name)
  File "/usr/local/lib/python3.6/dist-packages/gym/envs/registration.py", line 142, in make
    return registry.make(id, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/gym/envs/registration.py", line 86, in make
    spec = self.spec(path)
  File "/usr/local/lib/python3.6/dist-packages/gym/envs/registration.py", line 109, in spec
    raise error.Error('A module ({}) was specified for the environment but was not found, make sure the package is installed with `pip install` before calling `gym.make()`'.format(mod_name))
gym.error.Error: A module (gym-robotable) was specified for the environment but was not found, make sure the package is installed with `pip install` before calling `gym.make()`

Registration… hmm ARS.py doesn’t complain. We had this problem before.

pip3 install -e .
python3 setup.py install

nope… https://stackoverflow.com/questions/14295680/unable-to-import-a-module-that-is-definitely-installed it’s presumably here somewhere…

root@chrx:/opt/gym-robotable# pip show gym-robotable
Name: gym-robotable
Version: 0.0.1
Summary: UNKNOWN
Home-page: UNKNOWN
Author: UNKNOWN
Author-email: UNKNOWN
License: UNKNOWN
Location: /opt/gym-robotable
Requires: gym
Required-by: 

https://github.com/openai/gym/issues/1818 says You need to either import <name of your package> or do gym.make("<name of your package>:tic_tac_toe-v1"), see the creating environment guide for more information: https://github.com/openai/gym/blob/master/docs/creating-environments.md

Is it some fuckin gym-robotable vs gym_robotable thing?

Yes. Yes it is.


self.env_name = 'gym_robotable:RobotableEnv-v0'

Ok so now it’s working almost. But falls down sometimes and then the algorithm stops. Ah, needed to define ‘is_fallen’ correctly…

  def is_fallen(self):
    orientation = self.robotable.GetBaseOrientation()
    rot_mat = self._pybullet_client.getMatrixFromQuaternion(orientation)
    local_up = rot_mat[6:]
    pos = self.robotable.GetBasePosition()
    # return (np.dot(np.asarray([0, 0, 1]), np.asarray(local_up)) < 0.85 or pos[2] < -0.25)
    #print("POS", pos)
    #print("DOT", np.dot(np.asarray([0, 0, 1]), np.asarray(local_up)))

    return (pos[2] < -0.28)  #changing fallen definition for now, to height of table
    #return False

  def _termination(self):
    position = self.robotable.GetBasePosition()
    distance = math.sqrt(position[0]**2 + position[1]**2)
    return self.is_fallen() or distance > self._distance_limit

ok so now


if __name__ == "__main__":

  hp = Hp()
  env = gym.make(hp.env_name)

  model = PPO2(MlpPolicy, env, verbose=1)
  model.learn(total_timesteps=10000)

  for episode in range(100):
      obs = env.reset()
      for i in range(1000):
        action, _states = model.predict(obs)
        obs, rewards, dones, info = env.step(action)
        #env.render()
        if dones:
            print("Episode finished after {} timesteps".format(i + 1))
            break
        env.render(mode="human")

Now to continue training… https://github.com/hill-a/stable-baselines/issues/599

Categories
AI/ML

Continual learning

Carrying on from training in one domain, to training in another. https://www.continualai.org/ https://gist.github.com/araffin/a95dfd1accec437799f2e1e0370a1539

Wiki: https://wiki.continualai.org/
https://www.sciencedirect.com/science/article/pii/S1566253519307377
https://www.sciencedirect.com/science/article/pii/S0893608019300231
https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16410
https://arxiv.org/abs/1810.13166

Categories
AI/ML deep

Dark Knowledge

arxiv: https://arxiv.org/pdf/1503.02531.pdf

Slides: https://www.ttic.edu/dl/dark14.pdf

What is George Hinton’s Dark Knowledge?

Distilling the knowledge in an ensemble of models into a single model.

It was based on the ‘model compression’ paper of Rich Caruana http://www.cs.cornell.edu/~caruana/
http://www.cs.cornell.edu/~caruana/compression.kdd06.pdf

There is a distinction between hard and soft targets, when you train a smaller network to get the same results as a bigger network… If you train the smaller network based on a cost function minimizing the difference from the original larger network’s results, you lose some knowledge that was encoded in the ‘softer targets’. By changing the softmax function at the end of the classification network, it’s possible to take into account how likely a class is to be mistaken for the other classes.

Categories
AI/ML meta

Meta-learning & MAML

For like, having fall-back plans when things go wrong. Or like, phasing between policies, so you don’t “drop the ball”

https://arxiv.org/abs/1703.03400

Reminds me of Map-Elites, in that it collects behaviours.

“We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning problems, including classification, regression, and reinforcement learning. The goal of meta-learning is to train a model on a variety of learning tasks, such that it can solve new learning tasks using only a small number of training samples. In our approach, the parameters of the model are explicitly trained such that a small number of gradient steps with a small amount of training data from a new task will produce good generalization performance on that task. In effect, our method trains the model to be easy to fine-tune. We demonstrate that this approach leads to state-of-the-art performance on two few-shot image classification benchmarks, produces good results on few-shot regression, and accelerates fine-tuning for policy gradient reinforcement learning with neural network policies.”

Mostly based around the ES algorithm, they got a robot to walk straight again soon after hobbling it. https://ai.googleblog.com/2020/04/exploring-evolutionary-meta-learning-in.html

https://arxiv.org/pdf/2003.01239.pdf

“we present an evolutionary meta-learning algorithm
that enables locomotion policies to quickly adapt in noisy
real world scenarios. The core idea is to develop an efficient
and noise-tolerant adaptation operator, and integrate it into
meta-learning frameworks. We have shown that this Batch
Hill-Climbing operator works better in handling noise than
simply averaging rewards over multiple runs. Our algorithm
has achieved greater adaptation performance than the stateof-the-art MAML algorithms that are based on policy gradient. Finally, we validate our method on a real quadruped
robot. Trained in simulation, the locomotion policies can
successfully adapt to two real-world robot environments,
whose dynamics have been drastically changed.

In the future, we plan to extend our method in several
ways. First, we believe that we can replace the Gaussian
perturbations in the evolutionary algorithm with non-isotropic
samples to further improve the sample efficiency during
adaptation. With less robot data required for adaptation, we
plan to develop a lifelong learning system, in which the
robot can continuously collect data and quickly adjust its
policy to learn new skills and to operate optimally in new
environments
.”

Categories
AI/ML Locomotion robots sim2real simulation

Imitation Learning

This is the real ticket. Basically motion capture to speed up training. But when a robot can do this, we don’t need human workers anymore. (Except to provide examples of the actions to perform, and to build the first robot-building machine, or robot-building-building machines, etc.

videos: https://sites.google.com/view/nips2017-one-shot-imitation/home

arxiv: https://arxiv.org/pdf/1703.07326.pdf

abstract: https://arxiv.org/abs/1703.07326

Learning Agile Robotic Locomotion Skills by
Imitating Animals: https://xbpeng.github.io/projects/Robotic_Imitation/2020_Robotic_Imitation.pdf

Imitation is the ability to recognize and reproduce others’ actions – By extension, imitation learning is a means of learning and developing new skills from observing these skills performed by another agent. Imitation learning (IL) as applied to robots is a technique to reduce the complexity of search spaces for learning. When observing either good or bad examples, one can reduce the search for a possible solution, by either starting the search from the observed good solution (local optima), or conversely, by eliminating from the search space what is known as a bad solution. Imitation learning offers an implicit means of training a machine, such that explicit and tedious programming of a task by a human user can be minimized or eliminated. Imitation learning is thus a “natural” means of training a machine, meant to be accessible to lay people. – (https://link.springer.com/referenceworkentry/10.1007%2F978-1-4419-1428-6_758)

OpenAI’s https://openai.com/blog/robots-that-learn/

“We’ve created a robotics system, trained entirely in simulation and deployed on a physical robot, which can learn a new task after seeing it done once.”

Categories
MFRU

INSTALLATION PROTOTYPE
MFRU WTF

I had a thought, that it could also be less slick, like random older screens and stuff, like things that need VGA cables. Feels more prototypey/working spacey. Just a thought.

Categories
AI/ML arxiv GANs

GANs in Keras

Came across this guy’s project

https://github.com/germain-hug/GANs-Keras

Mentioned some papers on GANs. Interesting for overview of related algorithms.

https://arxiv.org/abs/1511.06434 – Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

https://arxiv.org/abs/1701.07875 – Wasserstein GAN

https://arxiv.org/abs/1411.1784 – Conditional Generative Adversarial Nets

https://arxiv.org/abs/1606.03657 – InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

Categories
dev Locomotion The Sentient Table

ConvertFromLegModel

This is a confusing bit of code


  def ConvertFromLegModel(self, actions):
    """Convert the actions that use leg model to the real motor actions.
    Args:
      actions: The theta, phi of the leg model.
    Returns:
      The eight desired motor angles that can be used in ApplyActions().
    """
  

 COPY THE ACTIONS.

    motor_angle = copy.deepcopy(actions)

DEFINE SOME THINGS

    scale_for_singularity = 1
    offset_for_singularity = 1.5
    half_num_motors = int(self.num_motors / 2)
    quarter_pi = math.pi / 4

FOR EVERY MOTOR

    for i in range(self.num_motors):

THE ACTION INDEX IS THE FLOOR OF HALF. 00112233
      action_idx = int(i // 2)

WELL, SO, THE FORWARD BACKWARD COMPONENT is 
negative thingy times 45 degrees times (the action of the index plus half the motors.... plus the offset thingy)

      forward_backward_component = (
          -scale_for_singularity * quarter_pi *
          (actions[action_idx + half_num_motors] + offset_for_singularity))

AND SO THE EXTENSION COMPONENT IS either + or - 45 degrees times the action.

      extension_component = (-1)**i * quarter_pi * actions[action_idx]

IF 4,5,6,7 MAKE THAT THING NEGATIVE.

      if i >= half_num_motors:
        extension_component = -extension_component

THE ANGLE IS... PI + thingy 1 + thingy 2.

      motor_angle[i] = (math.pi + forward_backward_component + extension_component)



    return motor_angle

Ok my error is,

  File "/opt/gym-robotable/gym_robotable/envs/robotable_gym_env.py", line 350, in step
    action = self._transform_action_to_motor_command(action)
  File "/opt/gym-robotable/gym_robotable/envs/robotable_gym_env.py", line 313, in _transform_action_to_motor_command
    action = self.robotable.ConvertFromLegModel(action)
AttributeError: 'Robotable' object has no attribute 'ConvertFromLegModel'

Ok anyway i debugged for an hour and now it’s doing something. it’s saving numpy files now.

policy_RobotableEnv-v0_20200516-192435.npy contains:

�NUMPYv{‘descr’: ‘<f8’, ‘fortran_order’: False, ‘shape’: (4, 16), }

Cool.

But yeah i had to comment out a lot of stuff. Seems like the actions it’s generating are mostly 0.

Since I simplified to a table, turns out I don’t need any of that ConvertFromLegModel code.


Ok anyway, i started over with minitaur. lol. why are there two tables? Changing the motorDirections gave me this. Good progress.

Categories
Vision

Early ConvNet visualisations

https://link.springer.com/article/10.1186/s40648-019-0141-2

https://imgur.com/a/Hqolp

AxCell: Automatic Extraction of Results from Machine Learning Papers

https://arxiv.org/abs/2004.14356