Categories
Locomotion simulation

Obstacle course

I was thinking that when I get the RL or EA working on the robot, for locomotion, I can just put things in its way, to train it for climbing over obstacles.

It seems that swapping the 2D plane for noise generated terrain is a common first step towards training a more resilient robot in simulation.

Categories
AI/ML robots simulation

State-Dependent Exploration

https://arxiv.org/pdf/2005.05719.pdf

Categories
dev GANs sim2real

GAN SimToReal

https://github.com/ugurkanates/awesome-real-world-rl#simulation-to-real-with-gans

GraspGAN: https://arxiv.org/pdf/1709.07857.pdf

RL-CycleGAN https://arxiv.org/pdf/2006.09001.pdf

And https://sim2realai.github.io/Quantifying-Transferability/ this whole website is interesting

Categories
AI/ML

Continual learning

So the whole issue that made me try get PPO working, and give up on ARS for a bit, is that I’m having trouble saving the policy to file, and then loading it back up.

https://stable-baselines.readthedocs.io/en/master/guide/examples.html#continual-learning

The current problem with the PPO version is that it’s just falling over in the reward direction.

Categories
AI/ML dev simulation

Prioritized Experience Replay

While looking at the stable baselines docs, I came across PER: 
https://arxiv.org/pdf/1511.05952.pdf

I saw it in the parameters of DQN (https://openai.com/blog/openai-baselines-dqn/ )- It was OpenAI's 2017 algorithm. released when they released their baselines). 

model = DQN(‘MlpPolicy’, env, learning_rate=1e-3, prioritized_replay=True, verbose=1)

Categories
Math

Vaseršteĭn metric

Related to Transportation theory

https://en.wikipedia.org/wiki/Transportation_theory_(mathematics) https://en.wikipedia.org/wiki/Wasserstein_metric

Something like factorio, where you want to optimise logistics. Found it here, https://github.com/matthieuheitz/WassersteinDictionaryLearning which I came across while looking for ways to visualise npy files.

In mathematics, the Wasserstein or Kantorovich–Rubinstein metric or distance is a distance function defined between probability distributions on a given metric space {\displaystyle M}M.

Intuitively, if each distribution is viewed as a unit amount of “dirt” piled on {M}, the metric is the minimum “cost” of turning one pile into the other, which is assumed to be the amount of dirt that needs to be moved times the mean distance it has to be moved. Because of this analogy, the metric is known in computer science as the earth mover’s distance.

https://arxiv.org/pdf/1708.01955.pdf is full of statistical optimisation jargon, related to this https://en.wikipedia.org/wiki/Sparse_dictionary_learning which mentions Stochastic gradient descent as a type. So it’s like sampling something and generalising a function.

“Sparse coding is a representation learning method which aims at finding a sparse representation of the input data in the form of a linear combination of basic elements as well as those basic elements themselves. These elements are called atoms and they compose a dictionary. “

https://en.wikipedia.org/wiki/Duality_(optimization) In mathematical optimization theory, duality or the duality principle is the principle that optimization problems may be viewed from either of two perspectives, the primal problem or the dual problem. The solution to the dual problem provides a lower bound to the solution of the primal (minimization) problem.[1] However in general the optimal values of the primal and dual problems need not be equal. Their difference is called the duality gap. For convex optimization problems, the duality gap is zero under a constraint qualification condition.

For the npy files I came across his https://github.com/matthieuheitz/npy_viewer which had a couple cool programs.

Categories
3D Research AI/ML robots simulation

DREAM (SRL)

9:52 / 55:39

State Representation Learning

https://s-rl-toolbox.readthedocs.io/en/latest/index.html
I came across this toolbox because with the robot, I am not sure what’s wrong, saving and loading policy files for ARS.

I asked about it here: https://pybullet.org/Bullet/phpBB3/viewtopic.php?f=24&t=13005

So I was considering using some “stable baseline” RL algorithms. They have an implementation of PPO, which is another recent algorithm.

Categories
dev Locomotion The Sentient Table

Stable baselines

Need something to compare results to.

To install,

https://stable-baselines.readthedocs.io/en/master/guide/install.html

pip install git+https://github.com/hill-a/stable-baselines

Successfully installed absl-py-0.9.0 astor-0.8.1 gast-0.2.2 google-pasta-0.2.0 grpcio-1.30.0 h5py-2.10.0 keras-applications-1.0.8 keras-preprocessing-1.1.2 opt-einsum-3.2.1 stable-baselines-2.10.1a1 tensorboard-1.15.0 tensorflow-1.15.3 tensorflow-estimator-1.15.1 termcolor-1.1.0 wrapt-1.12.1

i’d originally done

pip install stable-baselines[mpi]

but the github installs dependencies too.

ok so pybullet comes with an ‘enjoy’ program which

~/.local/lib/python3.6/site-packages/pybullet_envs/stable_baselines

You can run it using:

python3 -m pybullet_envs.stable_baselines.enjoy –algo td3 –env HalfCheetahBulletEnv-v0

Ok I set up ppo2 and tried to run python3 ppo2.py

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/gym/envs/registration.py", line 106, in spec
    importlib.import_module(mod_name)
  File "/usr/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 953, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'gym-robotable'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "ppo2.py", line 29, in <module>
    env = gym.make(hp.env_name)
  File "/usr/local/lib/python3.6/dist-packages/gym/envs/registration.py", line 142, in make
    return registry.make(id, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/gym/envs/registration.py", line 86, in make
    spec = self.spec(path)
  File "/usr/local/lib/python3.6/dist-packages/gym/envs/registration.py", line 109, in spec
    raise error.Error('A module ({}) was specified for the environment but was not found, make sure the package is installed with `pip install` before calling `gym.make()`'.format(mod_name))
gym.error.Error: A module (gym-robotable) was specified for the environment but was not found, make sure the package is installed with `pip install` before calling `gym.make()`

Registration… hmm ARS.py doesn’t complain. We had this problem before.

pip3 install -e .
python3 setup.py install

nope… https://stackoverflow.com/questions/14295680/unable-to-import-a-module-that-is-definitely-installed it’s presumably here somewhere…

root@chrx:/opt/gym-robotable# pip show gym-robotable
Name: gym-robotable
Version: 0.0.1
Summary: UNKNOWN
Home-page: UNKNOWN
Author: UNKNOWN
Author-email: UNKNOWN
License: UNKNOWN
Location: /opt/gym-robotable
Requires: gym
Required-by: 

https://github.com/openai/gym/issues/1818 says You need to either import <name of your package> or do gym.make("<name of your package>:tic_tac_toe-v1"), see the creating environment guide for more information: https://github.com/openai/gym/blob/master/docs/creating-environments.md

Is it some fuckin gym-robotable vs gym_robotable thing?

Yes. Yes it is.


self.env_name = 'gym_robotable:RobotableEnv-v0'

Ok so now it’s working almost. But falls down sometimes and then the algorithm stops. Ah, needed to define ‘is_fallen’ correctly…

  def is_fallen(self):
    orientation = self.robotable.GetBaseOrientation()
    rot_mat = self._pybullet_client.getMatrixFromQuaternion(orientation)
    local_up = rot_mat[6:]
    pos = self.robotable.GetBasePosition()
    # return (np.dot(np.asarray([0, 0, 1]), np.asarray(local_up)) < 0.85 or pos[2] < -0.25)
    #print("POS", pos)
    #print("DOT", np.dot(np.asarray([0, 0, 1]), np.asarray(local_up)))

    return (pos[2] < -0.28)  #changing fallen definition for now, to height of table
    #return False

  def _termination(self):
    position = self.robotable.GetBasePosition()
    distance = math.sqrt(position[0]**2 + position[1]**2)
    return self.is_fallen() or distance > self._distance_limit

ok so now


if __name__ == "__main__":

  hp = Hp()
  env = gym.make(hp.env_name)

  model = PPO2(MlpPolicy, env, verbose=1)
  model.learn(total_timesteps=10000)

  for episode in range(100):
      obs = env.reset()
      for i in range(1000):
        action, _states = model.predict(obs)
        obs, rewards, dones, info = env.step(action)
        #env.render()
        if dones:
            print("Episode finished after {} timesteps".format(i + 1))
            break
        env.render(mode="human")

Now to continue training… https://github.com/hill-a/stable-baselines/issues/599

Categories
AI/ML

Continual learning

Carrying on from training in one domain, to training in another. https://www.continualai.org/ https://gist.github.com/araffin/a95dfd1accec437799f2e1e0370a1539

Wiki: https://wiki.continualai.org/
https://www.sciencedirect.com/science/article/pii/S1566253519307377
https://www.sciencedirect.com/science/article/pii/S0893608019300231
https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16410
https://arxiv.org/abs/1810.13166

Categories
AI/ML deep

Dark Knowledge

arxiv: https://arxiv.org/pdf/1503.02531.pdf

Slides: https://www.ttic.edu/dl/dark14.pdf

What is George Hinton’s Dark Knowledge?

Distilling the knowledge in an ensemble of models into a single model.

It was based on the ‘model compression’ paper of Rich Caruana http://www.cs.cornell.edu/~caruana/
http://www.cs.cornell.edu/~caruana/compression.kdd06.pdf

There is a distinction between hard and soft targets, when you train a smaller network to get the same results as a bigger network… If you train the smaller network based on a cost function minimizing the difference from the original larger network’s results, you lose some knowledge that was encoded in the ‘softer targets’. By changing the softmax function at the end of the classification network, it’s possible to take into account how likely a class is to be mistaken for the other classes.