Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/gym/envs/registration.py", line 106, in spec
importlib.import_module(mod_name)
File "/usr/lib/python3.6/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 994, in _gcd_import
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 953, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'gym-robotable'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "ppo2.py", line 29, in <module>
env = gym.make(hp.env_name)
File "/usr/local/lib/python3.6/dist-packages/gym/envs/registration.py", line 142, in make
return registry.make(id, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/gym/envs/registration.py", line 86, in make
spec = self.spec(path)
File "/usr/local/lib/python3.6/dist-packages/gym/envs/registration.py", line 109, in spec
raise error.Error('A module ({}) was specified for the environment but was not found, make sure the package is installed with `pip install` before calling `gym.make()`'.format(mod_name))
gym.error.Error: A module (gym-robotable) was specified for the environment but was not found, make sure the package is installed with `pip install` before calling `gym.make()`
Registration… hmm ARS.py doesn’t complain. We had this problem before.
if __name__ == "__main__":
hp = Hp()
env = gym.make(hp.env_name)
model = PPO2(MlpPolicy, env, verbose=1)
model.learn(total_timesteps=10000)
for episode in range(100):
obs = env.reset()
for i in range(1000):
action, _states = model.predict(obs)
obs, rewards, dones, info = env.step(action)
#env.render()
if dones:
print("Episode finished after {} timesteps".format(i + 1))
break
env.render(mode="human")
There is a distinction between hard and soft targets, when you train a smaller network to get the same results as a bigger network… If you train the smaller network based on a cost function minimizing the difference from the original larger network’s results, you lose some knowledge that was encoded in the ‘softer targets’. By changing the softmax function at the end of the classification network, it’s possible to take into account how likely a class is to be mistaken for the other classes.
Reminds me of Map-Elites, in that it collects behaviours.
“We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning problems, including classification, regression, and reinforcement learning. The goal of meta-learning is to train a model on a variety of learning tasks, such that it can solve new learning tasks using only a small number of training samples. In our approach, the parameters of the model are explicitly trained such that a small number of gradient steps with a small amount of training data from a new task will produce good generalization performance on that task. In effect, our method trains the model to be easy to fine-tune. We demonstrate that this approach leads to state-of-the-art performance on two few-shot image classification benchmarks, produces good results on few-shot regression, and accelerates fine-tuning for policy gradient reinforcement learning with neural network policies.”
“we present an evolutionary meta-learning algorithm that enables locomotion policies to quickly adapt in noisy real world scenarios. The core idea is to develop an efficient and noise-tolerant adaptation operator, and integrate it into meta-learning frameworks. We have shown that this Batch Hill-Climbing operator works better in handling noise than simply averaging rewards over multiple runs. Our algorithm has achieved greater adaptation performance than the stateof-the-art MAML algorithms that are based on policy gradient. Finally, we validate our method on a real quadruped robot. Trained in simulation, the locomotion policies can successfully adapt to two real-world robot environments, whose dynamics have been drastically changed. In the future, we plan to extend our method in several ways. First, we believe that we can replace the Gaussian perturbations in the evolutionary algorithm with non-isotropic samples to further improve the sample efficiency during adaptation. With less robot data required for adaptation, we plan to develop a lifelong learning system, in which the robot can continuously collect data and quickly adjust its policy to learn new skills and to operate optimally in new environments.”
This is the real ticket. Basically motion capture to speed up training. But when a robot can do this, we don’t need human workers anymore. (Except to provide examples of the actions to perform, and to build the first robot-building machine, or robot-building-building machines, etc.
Imitation is the ability to recognize and reproduce others’ actions – By extension, imitation learning is a means of learning and developing new skills from observing these skills performed by another agent. Imitation learning (IL) as applied to robots is a technique to reduce the complexity of search spaces for learning. When observing either good or bad examples, one can reduce the search for a possible solution, by either starting the search from the observed good solution (local optima), or conversely, by eliminating from the search space what is known as a bad solution. Imitation learning offers an implicit means of training a machine, such that explicit and tedious programming of a task by a human user can be minimized or eliminated. Imitation learning is thus a “natural” means of training a machine, meant to be accessible to lay people. – (https://link.springer.com/referenceworkentry/10.1007%2F978-1-4419-1428-6_758)
“We’ve created a robotics system, trained entirely in simulation and deployed on a physical robot, which can learn a new task after seeing it done once.”
I had a thought, that it could also be less slick, like random older screens and stuff, like things that need VGA cables. Feels more prototypey/working spacey. Just a thought.
def ConvertFromLegModel(self, actions):
"""Convert the actions that use leg model to the real motor actions.
Args:
actions: The theta, phi of the leg model.
Returns:
The eight desired motor angles that can be used in ApplyActions().
"""
COPY THE ACTIONS.
motor_angle = copy.deepcopy(actions)
DEFINE SOME THINGS
scale_for_singularity = 1
offset_for_singularity = 1.5
half_num_motors = int(self.num_motors / 2)
quarter_pi = math.pi / 4
FOR EVERY MOTOR
for i in range(self.num_motors):
THE ACTION INDEX IS THE FLOOR OF HALF. 00112233
action_idx = int(i // 2)
WELL, SO, THE FORWARD BACKWARD COMPONENT is
negative thingy times 45 degrees times (the action of the index plus half the motors.... plus the offset thingy)
forward_backward_component = (
-scale_for_singularity * quarter_pi *
(actions[action_idx + half_num_motors] + offset_for_singularity))
AND SO THE EXTENSION COMPONENT IS either + or - 45 degrees times the action.
extension_component = (-1)**i * quarter_pi * actions[action_idx]
IF 4,5,6,7 MAKE THAT THING NEGATIVE.
if i >= half_num_motors:
extension_component = -extension_component
THE ANGLE IS... PI + thingy 1 + thingy 2.
motor_angle[i] = (math.pi + forward_backward_component + extension_component)
return motor_angle
Ok my error is,
File "/opt/gym-robotable/gym_robotable/envs/robotable_gym_env.py", line 350, in step
action = self._transform_action_to_motor_command(action)
File "/opt/gym-robotable/gym_robotable/envs/robotable_gym_env.py", line 313, in _transform_action_to_motor_command
action = self.robotable.ConvertFromLegModel(action)
AttributeError: 'Robotable' object has no attribute 'ConvertFromLegModel'
Ok anyway i debugged for an hour and now it’s doing something. it’s saving numpy files now.