Categories
dev

Ray/RLLib PBT & ARS

I’ve posted an issue to try get ARS working. https://github.com/ray-project/ray/issues/9573

  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/agents/ars/ars_tf_policy.py", line 59, in compute_actions
    observation = self.observation_filter(observation[None], update=update)
TypeError: list indices must be integers or slices, not NoneType

No idea yet, but some other bug mentioned something about numpy arrays vs. (good old) arrays. But anyhow, would be great if I can get ARS working on Ray/RLLib, because I just get the sense that PPO is too dumb. It’s never managed to get past falling over, with quite a bit a hyperparam tweaking.

At least ARS has evolved a walking table, so far. When it works in Ray, perhaps we will have the policy save and load, and I can move onto replaying experiences, or continuing training at a checkpoint, etc.

Huh great, well I solved my problems. and it’s running something now.

But rollouts are not ending now. Ok it looks like I need to put a time limit for the environment in the environment, rather than it being a hyperparameter like in pyBullet’s implementation.

Well then, onto the next issue.

Categories
deep dev

Ray / RLLib PBT & PPO

Got Population Based Training PPO running in RLLib. It seems to have maxed out rewards. (Asymptotically approaching 0.74).

PPO isn’t great for this. But let’s see if we can replay with GUI after this.

I asked for these

hyperparam_mutations={
    "lambda": lambda: random.uniform(0.9, 1.0),
    "clip_param": lambda: random.uniform(0.01, 0.5),
    "lr": [1e-3, 5e-4, 1e-4, 5e-5, 1e-5]
    # ,
    # "num_sgd_iter": lambda: random.randint(1, 30),
    # "sgd_minibatch_size": lambda: random.randint(128, 16384),
    # "train_batch_size": lambda: random.randint(2000, 160000),
})

cat pbt_global.txt
["5", "7", 17, 18, {"env": "RobotableEnv-v0", "kl_coeff": 1.0, "num_workers": 2, "num_gpus": 0, "model": {"free_log_std": true}, "lambda": 0.95, "clip_param": 0.2, "lr": 0.0001, "num_sgd_iter": 20, "sgd_minibatch_size": 500, "train_batch_size": 10000}, {"env": "RobotableEnv-v0", "kl_coeff": 1.0, "num_workers": 2, "num_gpus": 0, "model": {"free_log_std": true}, "lambda": 0.76, "clip_param": 0.16000000000000003, "lr": 5e-05, "num_sgd_iter": 20, "sgd_minibatch_size": 500, "train_batch_size": 10000}]

["3", "1", 35, 32, {"env": "RobotableEnv-v0", "kl_coeff": 1.0, "num_workers": 2, "num_gpus": 0, "model": {"free_log_std": true}, "lambda": 0.95, "clip_param": 0.2, "lr": 0.0001, "num_sgd_iter": 20, "sgd_minibatch_size": 500, "train_batch_size": 10000}, {"env": "RobotableEnv-v0", "kl_coeff": 1.0, "num_workers": 2, "num_gpus": 0, "model": {"free_log_std": true}, "lambda": 1.14, "clip_param": 0.1096797541550122, "lr": 5e-05, "num_sgd_iter": 20, "sgd_minibatch_size": 500, "train_batch_size": 10000}]

["3", "7", 35, 36, {"env": "RobotableEnv-v0", "kl_coeff": 1.0, "num_workers": 2, "num_gpus": 0, "model": {"free_log_std": true}, "lambda": 0.95, "clip_param": 0.2, "lr": 0.0001, "num_sgd_iter": 20, "sgd_minibatch_size": 500, "train_batch_size": 10000}, {"env": "RobotableEnv-v0", "kl_coeff": 1.0, "num_workers": 2, "num_gpus": 0, "model": {"free_log_std": true}, "lambda": 0.76, "clip_param": 0.24, "lr": 0.001, "num_sgd_iter": 20, "sgd_minibatch_size": 500, "train_batch_size": 10000}]

["5", "6", 37, 35, {"env": "RobotableEnv-v0", "kl_coeff": 1.0, "num_workers": 2, "num_gpus": 0, "model": {"free_log_std": true}, "lambda": 0.95, "clip_param": 0.2, "lr": 0.0001, "num_sgd_iter": 20, "sgd_minibatch_size": 500, "train_batch_size": 10000}, {"env": "RobotableEnv-v0", "kl_coeff": 1.0, "num_workers": 2, "num_gpus": 0, "model": {"free_log_std": true}, "lambda": 1.14, "clip_param": 0.16000000000000003, "lr": 5e-05, "num_sgd_iter": 20, "sgd_minibatch_size": 500, "train_batch_size": 10000}]




== Status ==
Memory usage on this node: 2.7/3.8 GiB
PopulationBasedTraining: 28 checkpoints, 3 perturbs
Resources requested: 3/4 CPUs, 0/0 GPUs, 0.0/0.93 GiB heap, 0.0/0.29 GiB objects
Result logdir: /root/ray_results/PBT_ROBOTABLE
Number of trials: 8 (7 PAUSED, 1 RUNNING)
+---------------------------------+----------+-----------------------+--------+------------------+--------+----------+
| Trial name                      | status   | loc                   |   iter |   total time (s) |     ts |   reward |
|---------------------------------+----------+-----------------------+--------+------------------+--------+----------|
| PPO_RobotableEnv-v0_c67a8_00000 | PAUSED   |                       |     36 |         1069.1   | 360000 | 0.735323 |
| PPO_RobotableEnv-v0_c67a8_00001 | PAUSED   |                       |     36 |         1096.3   | 360000 | 0.736305 |
| PPO_RobotableEnv-v0_c67a8_00002 | PAUSED   |                       |     33 |          987.687 | 330000 | 0.735262 |
| PPO_RobotableEnv-v0_c67a8_00003 | PAUSED   |                       |     36 |         1096.22  | 360000 | 0.731993 |
| PPO_RobotableEnv-v0_c67a8_00004 | PAUSED   |                       |     37 |         1103.48  | 370000 | 0.739188 |
| PPO_RobotableEnv-v0_c67a8_00005 | RUNNING  | 192.168.101.127:14690 |     37 |         1101.5   | 370000 | 0.727506 |
| PPO_RobotableEnv-v0_c67a8_00006 | PAUSED   |                       |     35 |         1067.26  | 350000 | 0.739985 |
| PPO_RobotableEnv-v0_c67a8_00007 | PAUSED   |                       |     36 |         1085.05  | 360000 | 0.739295 |
+---------------------------------+----------+-----------------------+--------+------------------+--------+----------+


2020-07-19 17:27:53,966	INFO pbt.py:78 -- [explore] perturbed config from {'env': 'RobotableEnv-v0', 'kl_coeff': 1.0, 'num_workers': 2, 'num_gpus': 0, 'model': {'free_log_std': True}, 'lambda': 0.95, 'clip_param': 0.2, 'lr': 0.0001, 'num_sgd_iter': 20, 'sgd_minibatch_size': 500, 'train_batch_size': 10000} -> {'env': 'RobotableEnv-v0', 'kl_coeff': 1.0, 'num_workers': 2, 'num_gpus': 0, 'model': {'free_log_std': True}, 'lambda': 1.14, 'clip_param': 0.16000000000000003, 'lr': 5e-05, 'num_sgd_iter': 20, 'sgd_minibatch_size': 500, 'train_batch_size': 10000}
2020-07-19 17:27:53,966	INFO pbt.py:316 -- [exploit] transferring weights from trial PPO_RobotableEnv-v0_c67a8_00006 (score 0.7399848299949074) -> PPO_RobotableEnv-v0_c67a8_00005 (score 0.7241841897925536)
Result for PPO_RobotableEnv-v0_c67a8_00005:
  custom_metrics: {}
  date: 2020-07-19_17-27-53
  done: false
  episode_len_mean: 114.58
  episode_reward_max: 0.7808001167724908
  episode_reward_mean: 0.7241841897925536
  episode_reward_min: 0.6627154081217708
  episodes_this_iter: 88
  episodes_total: 2500
  experiment_id: e3408f32ed2a433d8c7edb87d33609ba
  experiment_tag: 5@perturbed[clip_param=0.16,lambda=1.14,lr=5e-05]
  hostname: chrx
  info:
    learner:
      default_policy:
        cur_kl_coeff: 0.0625
        cur_lr: 4.999999873689376e-05
        entropy: 5.101933479309082
        entropy_coeff: 0.0
        kl: 0.004210006445646286
        model: {}
        policy_loss: -0.0077978381887078285
        total_loss: -0.007088268641382456
        vf_explained_var: 0.9757658243179321
        vf_loss: 0.0004464423400349915
    num_steps_sampled: 380000
    num_steps_trained: 380000
  iterations_since_restore: 5
  node_ip: 192.168.101.127
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf:
    cpu_util_percent: 66.7095238095238
    ram_util_percent: 72.5452380952381
  pid: 14690
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_env_wait_ms: 1.5935033550679747
    mean_inference_ms: 1.8385610163959398
    mean_processing_ms: 1.195529456155168
  time_since_restore: 147.82027745246887
  time_this_iter_s: 29.546902656555176
  time_total_s: 1131.04909491539
  timers:
    learn_throughput: 1880.23
    learn_time_ms: 5318.497
    load_throughput: 350730.091
    load_time_ms: 28.512
    sample_throughput: 414.501
    sample_time_ms: 24125.418
    update_time_ms: 4.191
  timestamp: 1595179673
  timesteps_since_restore: 0
  timesteps_total: 380000
  training_iteration: 38
  trial_id: c67a8_00005
  
2020-07-19 17:27:54,989	WARNING util.py:137 -- The `experiment_checkpoint` operation took 0.8819785118103027 seconds to complete, which may be a performance bottleneck.
== Status ==
Memory usage on this node: 2.6/3.8 GiB
PopulationBasedTraining: 28 checkpoints, 4 perturbs
Resources requested: 0/4 CPUs, 0/0 GPUs, 0.0/0.93 GiB heap, 0.0/0.29 GiB objects
Result logdir: /root/ray_results/PBT_ROBOTABLE
Number of trials: 8 (8 PAUSED)
+---------------------------------+----------+-------+--------+------------------+--------+----------+
| Trial name                      | status   | loc   |   iter |   total time (s) |     ts |   reward |
|---------------------------------+----------+-------+--------+------------------+--------+----------|
| PPO_RobotableEnv-v0_c67a8_00000 | PAUSED   |       |     36 |         1069.1   | 360000 | 0.735323 |
| PPO_RobotableEnv-v0_c67a8_00001 | PAUSED   |       |     36 |         1096.3   | 360000 | 0.736305 |
| PPO_RobotableEnv-v0_c67a8_00002 | PAUSED   |       |     33 |          987.687 | 330000 | 0.735262 |
| PPO_RobotableEnv-v0_c67a8_00003 | PAUSED   |       |     36 |         1096.22  | 360000 | 0.731993 |
| PPO_RobotableEnv-v0_c67a8_00004 | PAUSED   |       |     37 |         1103.48  | 370000 | 0.739188 |
| PPO_RobotableEnv-v0_c67a8_00005 | PAUSED   |       |     38 |         1131.05  | 380000 | 0.724184 |
| PPO_RobotableEnv-v0_c67a8_00006 | PAUSED   |       |     35 |         1067.26  | 350000 | 0.739985 |
| PPO_RobotableEnv-v0_c67a8_00007 | PAUSED   |       |     36 |         1085.05  | 360000 | 0.739295 |
+---------------------------------+----------+-------+--------+------------------+--------+----------+


(pid=14800) 2020-07-19 17:27:58,611	INFO trainer.py:585 -- Tip: set framework=tfe or the --eager flag to enable TensorFlow eager execution
(pid=14800) 2020-07-19 17:27:58,611	INFO trainer.py:612 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
(pid=14800) pybullet build time: Mar 17 2020 17:46:41
(pid=14800) /usr/local/lib/python3.6/dist-packages/gym/logger.py:30: UserWarning: WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.
(pid=14800)   warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
(pid=14913) pybullet build time: Mar 17 2020 17:46:41
(pid=14913) 2020-07-19 17:28:00,118	INFO trainer.py:585 -- Tip: set framework=tfe or the --eager flag to enable TensorFlow eager execution
(pid=14913) 2020-07-19 17:28:00,118	INFO trainer.py:612 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
(pid=14913) /usr/local/lib/python3.6/dist-packages/gym/logger.py:30: UserWarning: WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.
(pid=14913)   warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
(pid=14992) pybullet build time: Mar 17 2020 17:46:41
(pid=14993) pybullet build time: Mar 17 2020 17:46:41
(pid=14992) /usr/local/lib/python3.6/dist-packages/gym/logger.py:30: UserWarning: WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.
(pid=14992)   warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
(pid=14800) 2020-07-19 17:28:10,106	INFO trainable.py:181 -- _setup took 11.510 seconds. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
(pid=14993) /usr/local/lib/python3.6/dist-packages/gym/logger.py:30: UserWarning: WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.
(pid=14993)   warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
(pid=14800) 2020-07-19 17:28:10,126	WARNING util.py:37 -- Install gputil for GPU system monitoring.
(pid=14800) 2020-07-19 17:28:10,717	INFO trainable.py:423 -- Restored on 192.168.101.127 from checkpoint: /root/ray_results/PBT_ROBOTABLE/PPO_RobotableEnv-v0_5_2020-07-19_15-00-03bbqeih3t/tmpf1h5txefrestore_from_object/checkpoint-35
(pid=14800) 2020-07-19 17:28:10,717	INFO trainable.py:430 -- Current state after restoring: {'_iteration': 35, '_timesteps_total': None, '_time_total': 1067.2641203403473, '_episodes_total': 2289}
(pid=14913) 2020-07-19 17:28:12,388	INFO trainable.py:181 -- _setup took 12.284 seconds. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
(pid=14913) 2020-07-19 17:28:12,388	WARNING util.py:37 -- Install gputil for GPU system monitoring.
(pid=14913) 2020-07-19 17:28:12,760	INFO trainable.py:423 -- Restored on 192.168.101.127 from checkpoint: /root/ray_results/PBT_ROBOTABLE/PPO_RobotableEnv-v0_2_2020-07-19_14-52-33cutk2k27/tmplqac6svyrestore_from_object/checkpoint-33
(pid=14913) 2020-07-19 17:28:12,760	INFO trainable.py:430 -- Current state after restoring: {'_iteration': 33, '_timesteps_total': None, '_time_total': 987.687007188797, '_episodes_total': 2059}
(pid=15001) pybullet build time: Mar 17 2020 17:46:41
(pid=15001) /usr/local/lib/python3.6/dist-packages/gym/logger.py:30: UserWarning: WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.
(pid=15001)   warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
(pid=15088) pybullet build time: Mar 17 2020 17:46:41
(pid=15088) /usr/local/lib/python3.6/dist-packages/gym/logger.py:30: UserWarning: WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.
(pid=15088)   warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
Result for PPO_RobotableEnv-v0_c67a8_00002:
  custom_metrics: {}
  date: 2020-07-19_17-28-54
  done: false
  episode_len_mean: 110.78888888888889
  episode_reward_max: 0.8009732276880979
  episode_reward_mean: 0.7387077080695522
  episode_reward_min: 0.6640543988817607
  episodes_this_iter: 90
  episodes_total: 2149
  experiment_id: edcd859a3ae34d668bb9be1899dde41a
  experiment_tag: '2'
  hostname: chrx
  info:
    learner:
      default_policy:
        cur_kl_coeff: 1.0
        cur_lr: 9.999999747378752e-05
        entropy: 5.111008644104004
        entropy_coeff: 0.0
        kl: 0.0031687873415648937
        model: {}
        policy_loss: -0.012367220595479012
        total_loss: -0.008663905784487724
        vf_explained_var: 0.9726411700248718
        vf_loss: 0.0005345290992408991
    num_steps_sampled: 340000
    num_steps_trained: 340000
  iterations_since_restore: 1
  node_ip: 192.168.101.127
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf:
    cpu_util_percent: 68.11833333333333
    ram_util_percent: 71.13666666666667
  pid: 14913
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_env_wait_ms: 1.6718764134441182
    mean_inference_ms: 1.9752634594235934
    mean_processing_ms: 1.2958259778937158
  time_since_restore: 41.650487661361694
  time_this_iter_s: 41.650487661361694
  time_total_s: 1029.3374948501587
  timers:
    learn_throughput: 1680.106
    learn_time_ms: 5952.007
    load_throughput: 74973.795
    load_time_ms: 133.38
    sample_throughput: 285.094
    sample_time_ms: 35076.171
    update_time_ms: 4.517
  timestamp: 1595179734
  timesteps_since_restore: 0
  timesteps_total: 340000
  training_iteration: 34
  trial_id: c67a8_00002
  
2020-07-19 17:28:55,042	WARNING util.py:137 -- The `experiment_checkpoint` operation took 0.5836038589477539 seconds to complete, which may be a performance bottleneck.
== Status ==
Memory usage on this node: 2.7/3.8 GiB
PopulationBasedTraining: 28 checkpoints, 4 perturbs
Resources requested: 3/4 CPUs, 0/0 GPUs, 0.0/0.93 GiB heap, 0.0/0.29 GiB objects
Result logdir: /root/ray_results/PBT_ROBOTABLE
Number of trials: 8 (7 PAUSED, 1 RUNNING)
+---------------------------------+----------+-----------------------+--------+------------------+--------+----------+
| Trial name                      | status   | loc                   |   iter |   total time (s) |     ts |   reward |
|---------------------------------+----------+-----------------------+--------+------------------+--------+----------|
| PPO_RobotableEnv-v0_c67a8_00000 | PAUSED   |                       |     36 |          1069.1  | 360000 | 0.735323 |
| PPO_RobotableEnv-v0_c67a8_00001 | PAUSED   |                       |     36 |          1096.3  | 360000 | 0.736305 |
| PPO_RobotableEnv-v0_c67a8_00002 | RUNNING  | 192.168.101.127:14913 |     34 |          1029.34 | 340000 | 0.738708 |
| PPO_RobotableEnv-v0_c67a8_00003 | PAUSED   |                       |     36 |          1096.22 | 360000 | 0.731993 |
| PPO_RobotableEnv-v0_c67a8_00004 | PAUSED   |                       |     37 |          1103.48 | 370000 | 0.739188 |
| PPO_RobotableEnv-v0_c67a8_00005 | PAUSED   |                       |     38 |          1131.05 | 380000 | 0.724184 |
| PPO_RobotableEnv-v0_c67a8_00006 | PAUSED   |                       |     35 |          1067.26 | 350000 | 0.739985 |
| PPO_RobotableEnv-v0_c67a8_00007 | PAUSED   |                       |     36 |          1085.05 | 360000 | 0.739295 |
+---------------------------------+----------+-----------------------+--------+------------------+--------+----------+

Categories
AI/ML control deep envs

Flow

https://flow-project.github.io/index.html

Seems like a framework with some sort of traffic environments

Categories
Math

Hessian Matrix

https://en.wikipedia.org/wiki/Hessian_matrix some sort of crazy math thing I’ve come across while researching NNs. My own opinion is that neurons in the brain are not doing any calculus, and to my knowledge, there’s still no proof of back-propagation in the brain, so whenever the math gets too complicated, there’s probably a simpler way to do it.

In mathematics, the Hessian matrix or Hessian is a square matrix of second-order partial derivatives of a scalar-valued function, or scalar field. It describes the local curvature of a function of many variables. The Hessian matrix was developed in the 19th century by the German mathematician Ludwig Otto Hesse and later named after him. Hesse originally used the term “functional determinants”.

https://stackoverflow.com/questions/23297090/how-calculating-hessian-works-for-neural-network-learning

https://activecalculus.org/multi/S-10-3-Second-Order-Partial-Derivatives.html

Second-order Partial Derivatives

A function ff of two independent variables xx and yy has two first order partial derivatives, fxfx and fy.fy. Each of these first-order partial derivatives has two partial derivatives, giving a total of four second-order partial derivatives:

  • fxx=(fx)x=∂∂x(∂f∂x)=∂2f∂x2,fxx=(fx)x=∂∂x(∂f∂x)=∂2f∂x2,
  • fyy=(fy)y=∂∂y(∂f∂y)=∂2f∂y2,fyy=(fy)y=∂∂y(∂f∂y)=∂2f∂y2,
  • fxy=(fx)y=∂∂y(∂f∂x)=∂2f∂y∂x,fxy=(fx)y=∂∂y(∂f∂x)=∂2f∂y∂x,
  • fyx=(fy)x=∂∂x(∂f∂y)=∂2f∂x∂y.fyx=(fy)x=∂∂x(∂f∂y)=∂2f∂x∂y.

The first two are called unmixed second-order partial derivatives while the last two are called the mixed second-order partial derivatives.

Categories
dev institutes

Anyscale Academy

https://github.com/anyscale/academy

They have a relevant tutorial on RLLib (Ray)

Categories
AI/ML dev Locomotion simulation

Replay

After seeing the ‘Replay Buffer’ in The TF-Agents SAC minitaur https://www.tensorflow.org/agents/tutorials/7_SAC_minitaur_tutorial tutorial, I’m starting to think replay is going to be a thing for the robot, one way or another.

I’m sticking to the google protobuf code that the minitaur uses, and will just need to save the best episodes, and work out how to replay them. The comments ask “use recordio?”

https://stackoverflow.com/questions/53219720/tfrecord-vs-recordio

import os
import inspect

currentdir = os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe())))
parentdir = os.path.dirname(os.path.dirname(currentdir))
os.sys.path.insert(0, parentdir)

import argparse
from gym_robotable.envs import logging

if __name__ == "__main__":

    parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
    parser.add_argument('--log_file', help='path to protobuf file', default='/opt/gym-robotable/logs/robotable_log_2020-07-12-204613')
    args = parser.parse_args()
    logging = logging.RobotableLogging()
    episode = logging.restore_episode(args.log_file)
    print(dir (episode))
    print("episode=",episode)
    fields = episode.ListFields()

So that’s printing out the episode.

Next step is saving only the best episodes

Then next step is stepping the simulation with the actions stored.

But now I’m not as sure. It might be better to switch to RLLAB (&Ray)

Would rather hide the details of serialization if I can.

Categories
dev evolution Locomotion

DEAP

Distributed Evolutionary Algorithms in Python

deap.readthedocs.org/

An oldie but a goodie. Was thinking to just implement the locomotion using good old genetic programming.

You could probably generate a walking robot using a genetic algorithm that repeated the tree’s actions a number of times. Bet it would be faster than the whole neural network policy training replay buffer spiel.

https://deap.readthedocs.io/en/master/tutorials/advanced/gp.html

Categories
AI/ML deep dev Locomotion

Ray

https://docs.ray.io/en/master/rllib-algorithms.html Seems like this might be the most up to date baselines repo.

Ray is a fast and simple framework for building and running distributed applications.

Ray is packaged with the following libraries for accelerating machine learning workloads:

  • Tune: Scalable Hyperparameter Tuning
  • RLlib: Scalable Reinforcement Learning
  • RaySGD: Distributed Training Wrappers

ARS implementation: https://github.com/ray-project/ray/blob/master/rllib/agents/ars/ars.py

Categories
AI/ML sim2real simulation

ADR

Automatic domain randomization

arxiv: https://arxiv.org/pdf/1910.07113.pdf

Increases randomisation of parameters as training goes on: https://openai.com/blog/solving-rubiks-cube/

Based on top of the work by these French sim2real ppl: https://hal.inria.fr/tel-01974203/file/89722_GOLEMO_2018_archivage.pdf

Categories
AI/ML institutes

DeepMind

Somehow just been following OpenAI and missed all the action at the other big algorithm R&D company. https://deepmind.com/research

Experience Replay: https://deepmind.com/research/open-source/Reverb

https://deepmind.com/research/open-source/Acme_os Their new framework; https://github.com/deepmind/acme

https://github.com/deepmind/dm_control – seems they’re a Mujoco house.