I was thinking that when I get the RL or EA working on the robot, for locomotion, I can just put things in its way, to train it for climbing over obstacles.
It seems that swapping the 2D plane for noise generated terrain is a common first step towards training a more resilient robot in simulation.
So the whole issue that made me try get PPO working, and give up on ARS for a bit, is that I’m having trouble saving the policy to file, and then loading it back up.
While looking at the stable baselines docs, I came across PER:
https://arxiv.org/pdf/1511.05952.pdf
I saw it in the parameters of DQN (https://openai.com/blog/openai-baselines-dqn/ )- It was OpenAI's 2017 algorithm. released when they released their baselines).
model = DQN(‘MlpPolicy’, env, learning_rate=1e-3, prioritized_replay=True, verbose=1)
Intuitively, if each distribution is viewed as a unit amount of “dirt” piled on {M}, the metric is the minimum “cost” of turning one pile into the other, which is assumed to be the amount of dirt that needs to be moved times the mean distance it has to be moved. Because of this analogy, the metric is known in computer science as the earth mover’s distance.
“Sparse coding is a representation learning method which aims at finding a sparse representation of the input data in the form of a linear combination of basic elements as well as those basic elements themselves. These elements are called atoms and they compose a dictionary. “
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/gym/envs/registration.py", line 106, in spec
importlib.import_module(mod_name)
File "/usr/lib/python3.6/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 994, in _gcd_import
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 953, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'gym-robotable'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "ppo2.py", line 29, in <module>
env = gym.make(hp.env_name)
File "/usr/local/lib/python3.6/dist-packages/gym/envs/registration.py", line 142, in make
return registry.make(id, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/gym/envs/registration.py", line 86, in make
spec = self.spec(path)
File "/usr/local/lib/python3.6/dist-packages/gym/envs/registration.py", line 109, in spec
raise error.Error('A module ({}) was specified for the environment but was not found, make sure the package is installed with `pip install` before calling `gym.make()`'.format(mod_name))
gym.error.Error: A module (gym-robotable) was specified for the environment but was not found, make sure the package is installed with `pip install` before calling `gym.make()`
Registration… hmm ARS.py doesn’t complain. We had this problem before.
if __name__ == "__main__":
hp = Hp()
env = gym.make(hp.env_name)
model = PPO2(MlpPolicy, env, verbose=1)
model.learn(total_timesteps=10000)
for episode in range(100):
obs = env.reset()
for i in range(1000):
action, _states = model.predict(obs)
obs, rewards, dones, info = env.step(action)
#env.render()
if dones:
print("Episode finished after {} timesteps".format(i + 1))
break
env.render(mode="human")
There is a distinction between hard and soft targets, when you train a smaller network to get the same results as a bigger network… If you train the smaller network based on a cost function minimizing the difference from the original larger network’s results, you lose some knowledge that was encoded in the ‘softer targets’. By changing the softmax function at the end of the classification network, it’s possible to take into account how likely a class is to be mistaken for the other classes.