Categories
dev Locomotion The Sentient Table

Spinning Up

OpenAI Spinning Up https://spinningup.openai.com/en/latest/

So I’ve got it working, OpenAI’s PPO. https://github.com/openai/spinningup/issues/142 – Needs a workaround to run your own envs.

But I can’t work out how to increase the exploration “factor”.

It’s some sort of application of a Gaussian distribution of noise, I think, which is the simple idea.

It looks like clipratio is maybe what i need.

hmm but ppo don’t want it.

parser.add_argument('--env', type=str, default='HalfCheetah-v2')
parser.add_argument('--hid', type=int, default=64)
parser.add_argument('--l', type=int, default=2)
parser.add_argument('--gamma', type=float, default=0.99)
parser.add_argument('--seed', '-s', type=int, default=0)
parser.add_argument('--cpu', type=int, default=4)
parser.add_argument('--steps', type=int, default=4000)
parser.add_argument('--epochs', type=int, default=50)
parser.add_argument('--exp_name', type=str, default='ppo')


def ppo(env_fn, actor_critic=core.mlp_actor_critic, ac_kwargs=dict(), seed=0,
        steps_per_epoch=4000, epochs=50, gamma=0.99, 


clip_ratio=0.2, 


pi_lr=3e-4,
        vf_lr=1e-3, train_pi_iters=80, train_v_iters=80, lam=0.97, max_ep_len=1000,
        target_kl=0.01, logger_kwargs=dict(), save_freq=10):

https://github.com/openai/spinningup/issues/12

ok so i tried SAC algo too, and the issue i have now is

(, AttributeError(“‘list’ object has no attribute ‘reshape’”,), )

So the thing is the dimensionality

“FetchReach environment has Dict observation space (because it packages not only arm position, but also the target location into the observation), and spinning up does not implement support for Dict observation spaces yet. One thing you can do is add a FlattenDictWrapper from gym (for example usage see, for instance,

env = FlattenDictWrapper(env, [‘observation’, ‘desired_goal’])

Spinning Up implementations currently only support envs with Box observation spaces (where observations are real-valued vectors). These environments have Dict observation spaces, so each obs is a dict of (key, vector) pairs. If you want to test things out in these envs, I recommend doing it as a hacking project! 🙂 “

Categories
CNNs dev

CNN Training

Which Image resolution should I use for training for deep neural network?

CIFAR dataset is 32px*32px,

MIT 128px*128px,

Stanford 96px*96px.

Following the advice here https://towardsdatascience.com/boost-your-cnn-image-classifier-performance-with-progressive-resizing-in-keras-a7d96da06e20

“small-image models are much faster to train.”

“Here is a smoothed kernel-density plot of image sizes in our “Open Fruits” dataset:”

Image for post

We see here that the images peak at around 128x128 in size. So for our initial input size we will choose 1/3 of that: 48x48.

Now it’s time to experiment! What kind of model you end up building in this phase of the project is entirely up to you.” (https://towardsdatascience.com/boost-your-cnn-image-classifier-performance-with-progressive-resizing-in-keras-a7d96da06e20)

I’ll have a look at the chicken images, and see how to scale them down. Maybe ffmpeg or convert or imagemagick pre-processing is better. But we’ll get there soon enough.

Categories
dev Linux

Low Linux memory

Spent enough time on this to warrant a note.

For some reason, pip install torch, which is what I was trying to do, kept dying. It’s a 700MB file, and top showed out of memory.

Ultimately the fix for that was:

pip install torch --no-cache-dir

(something was wrong with the cache I guess)

I also ended up deleting the contents of ~/.cache/pip which was 2.2GB. The new pip cache purge only clears wheels related libs.

Anyway, trying to do development on a 23GB chromebook with GalliumOS gets tough.

I spend a lot of time moving things around. I got myself an NVMe SSD, with 512GB to alleviate the situation.

The most common trick for looking at memory is df -h for seeing memory use, and du -h --max-depth=1 to see how big the directories are, below your current dir.

So, first thing first, the SSD doesn’t want to show up. Ah, the USB-C wasn’t pushed in all the way. Derp.

Second, to clear up some space, linux has journal logs.

https://unix.stackexchange.com/questions/139513/how-to-clear-journalctl :

set a max amount of logs to retain (by time/space):
journalctl --vacuum-time=2d
journalctl --vacuum-size=500M

The third thing is to make some more swap space, just in case.

touch /media/chrx/0FEC49A4317DA4DA/swapfile
cd /media/chrx/0FEC49A4317DA4DA/
sudo dd if=/dev/zero of=swapfile bs=2048 count=1048576
mkswap swapfile
swapon swapfile

swapon

NAME                          TYPE           SIZE   USED PRIO
/dev/zram0                    partition      5.6G 452.9M -2
/media/chrx/0FEC49A4317DA4DA/ swapfile file  2G       0B -3

Ok probably didn’t need more swap space. Not sure where /dev/zram0 is, but maybe I can free up more of it, and up the priority of the SSD?

Anyway, torch is installed now, so nevermind, until I need more memory.

Some more tricks:

Remove thumbnails:

du -sh ~/.cache/thumbnails

rm -rf ~/.cache/thumbnails/*

Clean apt cache:

sudo apt-get clean

Categories
AI/ML CNNs dev institutes OpenCV Vision

Detectron2

Ran through the nice working jupyter notebook https://colab.research.google.com/drive/16jcaJoc6bCFAQ96jDe2HwtXj7BMD_-m5#scrollTo=OpLg_MAQGPUT and produced this video

It is the Mask R-CNN algorithm from matterport, ported over by facebook labs, and better maintained. It was forked and fixed up for tourists.

We can train it on the robot eye view camera, maybe train it on google images of copyleft chickens and eggs.

I think this looks great, for endowing the robot with a basic “recognition” of the features of classes it’s been exposed to.

https://github.com/facebookresearch/detectron2/tree/master/projects

https://detectron2.readthedocs.io/tutorials/extend.html

Seems I was oblivious to Facebook AI but of course they hire very smart people. I’d sell my soul for $240k/yr too. It is super nice to get a working Jupyter Notebook. Thank you. https://ai.facebook.com/blog/-detectron2-a-pytorch-based-modular-object-detection-library-/

Here are the other FB project using detectron2, copy pasted:

Projects by Facebook

Note that these are research projects, and therefore may not have the same level of support or stability as detectron2.

External Projects

External projects in the community that use detectron2:

Also, more generally, https://ai.facebook.com/research/#recent-projects

Errors encountered while attempting to install https://detectron2.readthedocs.io/tutorials/getting_started.html

File "demo.py", line 8, in
import tqdm
ImportError: No module named tqdm

pip3 uninstall tqdm
pip3 install tqdm

Ok so…

python3 -m pip install -e .

python3 demo.py --config-file ../configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml --webcam --opts MODEL.WEIGHTS detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl

Requires pyyaml>=5.1

ok

pip install pyyaml==5.1
 Successfully built pyyaml
Installing collected packages: pyyaml
Attempting uninstall: pyyaml
Found existing installation: PyYAML 3.12
ERROR: Cannot uninstall 'PyYAML'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.

pip3 install --ignore-installed PyYAML
Successfully installed PyYAML-5.1

Next error...

ModuleNotFoundError: No module named 'torchvision'

pip install torchvision

Next error...

AssertionError:
Found no NVIDIA driver on your system. Please check that you
have an NVIDIA GPU and installed a driver from
http://www.nvidia.com/Download/index.aspx


ok

python3 demo.py --config-file ../configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml --webcam --opts MODEL.WEIGHTS detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl MODEL.DEVICE cpu


[08/17 20:53:11 detectron2]: Arguments: Namespace(confidence_threshold=0.5, config_file='../configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml', input=None, opts=['MODEL.WEIGHTS', 'detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl', 'MODEL.DEVICE', 'cpu'], output=None, video_input=None, webcam=True)
[08/17 20:53:12 fvcore.common.checkpoint]: Loading checkpoint from detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl
[08/17 20:53:12 fvcore.common.file_io]: Downloading https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl …
[08/17 20:53:12 fvcore.common.download]: Downloading from https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl …
model_final_f10217.pkl: 178MB [01:26, 2.05MB/s]
[08/17 20:54:39 fvcore.common.download]: Successfully downloaded /root/.torch/fvcore_cache/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl. 177841981 bytes.
[08/17 20:54:39 fvcore.common.file_io]: URL https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl cached in /root/.torch/fvcore_cache/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl
[08/17 20:54:39 fvcore.common.checkpoint]: Reading a file from 'Detectron2 Model Zoo'
0it [00:00, ?it/s]/opt/detectron2/detectron2/layers/wrappers.py:226: UserWarning: This overload of nonzero is deprecated:
nonzero()
Consider using one of the following signatures instead:
nonzero(*, bool as_tuple) (Triggered internally at /pytorch/torch/csrc/utils/python_arg_parser.cpp:766.)
return x.nonzero().unbind(1)
0it [00:06, ?it/s]
Traceback (most recent call last):
File "demo.py", line 118, in
cv2.namedWindow(WINDOW_NAME, cv2.WINDOW_NORMAL)
cv2.error: OpenCV(4.3.0) /io/opencv/modules/highgui/src/window.cpp:634: error: (-2:Unspecified error) The function is not implemented. Rebuild the library with Windows, GTK+ 2.x or Cocoa support. If you are on Ubuntu or Debian, install libgtk2.0-dev and pkg-config, then re-run cmake or configure script in function 'cvNamedWindow'


Ok...

pip install opencv-python

Requirement already satisfied: opencv-python in /usr/local/lib/python3.6/dist-packages (4.2.0.34)

Looks like 4.3.0 vs 4.2.0.34 kinda thing


sudo apt-get install libopencv-*


nope...

/opt/detectron2/detectron2/layers/wrappers.py:226: UserWarning: This overload of nonzero is deprecated:
nonzero()
Consider using one of the following signatures instead:
nonzero(*, bool as_tuple) (Triggered internally at /pytorch/torch/csrc/utils/python_arg_parser.cpp:766.)
return x.nonzero().unbind(1)


def nonzero_tuple(x):
"""
A 'as_tuple=True' version of torch.nonzero to support torchscript.
because of https://github.com/pytorch/pytorch/issues/38718
"""
if x.dim() == 0:
return x.unsqueeze(0).nonzero().unbind(1)
return x.nonzero(as_tuple=True).unbind(1)

AttributeError: 'tuple' object has no attribute 'unbind'


https://github.com/pytorch/pytorch/issues/38718

FFS. Why does nothing ever fucking work ?
pytorch 1.6:
"putting 1.6.0 milestone for now; this isn't the worst, but it's a pretty bad user experience."

Yeah no shit.

let's try...

return x.nonzero(as_tuple=False).unbind(1)

Ok next error same

/opt/detectron2/detectron2/modeling/roi_heads/fast_rcnn.py:111


Ok... back to this error (after adding as_tuple=False twice)


 File "demo.py", line 118, in
cv2.namedWindow(WINDOW_NAME, cv2.WINDOW_NORMAL)
cv2.error: OpenCV(4.3.0) /io/opencv/modules/highgui/src/window.cpp:634: error: (-2:Unspecified error) The function is not implemented. Rebuild the library with Windows, GTK+ 2.x or Cocoa support. If you are on Ubuntu or Debian, install libgtk2.0-dev and pkg-config, then re-run cmake or configure script in function 'cvNamedWindow'

Decided to check if maybe this is a conda vs pip thing. Like maybe I just need to install the conda version instead?

But it looks like a GTK+ 2.x isn’t installed. Seems I installed it using pip, i.e. pip install opencv-contrib-python and that isn’t built with gtk+2.x. I can also use qt as the graphical interface.

GTK supposedly uses more memory because GTK provides more functionality. Qt does less and uses less memory. If that is your logic, then you should also look at Aura and the many other user interface libraries providing less functionality.” (link )

https://stackoverflow.com/questions/14655969/opencv-error-the-function-is-not-implemented

https://askubuntu.com/questions/913241/error-in-executing-opencv-in-ubuntu

So let’s make a whole new Chapter, because we’re installing OpenCV again! (Why? Because I want to try run the detectron2 demo.py file.)

pip3 uninstall opencv-python
pip3 uninstall opencv-contrib-python 

(or sudo apt-get remove ___)

and afterwards build the opencv package from source code from github.

git clone https://github.com/opencv/opencv.git

cd ~/opencv

mkdir release

cd release

cmake -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local -D WITH_TBB=ON -D BUILD_NEW_PYTHON_SUPPORT=ON -D WITH_V4L=ON -D INSTALL_C_EXAMPLES=ON -D INSTALL_PYTHON_EXAMPLES=ON -D BUILD_EXAMPLES=ON -D WITH_QT=ON -D WITH_GTK=ON -D WITH_OPENGL=ON ..

make

sudo make install

ok… pls…

python3 demo.py –config-file ../configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml –webcam –opts MODEL.WEIGHTS detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl MODEL.DEVICE cpu

sweet jaysus finally.

Here’s an image of the network from a medium article on RCNN: https://medium.com/@hirotoschwert/digging-into-detectron-2-47b2e794fabd

Image for post
Categories
AI/ML dev Math

grad_norm and GradNorm

I was wondering what this ‘grad_norm’ parameter is. I got sidetracked by the paper with this name, which is maybe even unrelated. I haven’t gone to the Tune website yet to check if it’s the same thing. The grad_norm I was looking for is in the ARS code.

https://github.com/ray-project/ray/blob/master/rllib/agents/ars/ars.py

grad_norm gets smaller as episode_reward_mean gets bigger. So, gradient normalisation… gradient normalisation… still not sure.
info = {
“weights_norm”: np.square(theta).sum(),
“weights_std”: np.std(theta),
“grad_norm”: np.square(g).sum(),
“update_ratio”: update_ratio,
“episodes_this_iter”: noisy_lengths.size,
“episodes_so_far”: self.episodes_so_far,
}

So, it’s the sum of the square of g… and g is… the total of the ‘batched weighted sum’ of the ARS perturbation results, I think… but it’s not theta… which is the policy itself. Right, so the policy is the current agent/model brains, and g would be the average of the rollout tests, and so grad_norm, as a sum of squares is like the square of variance, so it’s a sort of measure of how much weights are changing, i think.

https://en.wikipedia.org/wiki/Partition_of_sums_of_squares – relevant?

        # Compute and take a step.
        g, count = utils.batched_weighted_sum(
            noisy_returns[:, 0] - noisy_returns[:, 1],
            (self.noise.get(index, self.policy.num_params)
             for index in noise_idx),
            batch_size=min(500, noisy_returns[:, 0].size))
        g /= noise_idx.size
        # scale the returns by their standard deviation
        if not np.isclose(np.std(noisy_returns), 0.0):
            g /= np.std(noisy_returns)
        assert (g.shape == (self.policy.num_params, )
                and g.dtype == np.float32)
        # Compute the new weights theta.
        theta, update_ratio = self.optimizer.update(-g)
        # Set the new weights in the local copy of the policy.
        self.policy.set_flat_weights(theta)
        # update the reward list
        if len(all_eval_returns) > 0:
            self.reward_list.append(eval_returns.mean())
def batched_weighted_sum(weights, vecs, batch_size):
    total = 0
    num_items_summed = 0
    for batch_weights, batch_vecs in zip(
            itergroups(weights, batch_size), itergroups(vecs, batch_size)):
        assert len(batch_weights) == len(batch_vecs) <= batch_size
        total += np.dot(
            np.asarray(batch_weights, dtype=np.float32),
            np.asarray(batch_vecs, dtype=np.float32))
        num_items_summed += len(batch_weights)
    return total, num_items_summed

Well anyway. I found another GradNorm.

GradNorm

https://arxiv.org/pdf/1711.02257.pdf – “We present a gradient normalization (GradNorm) algorithm that automatically balances training in deep multitask models by dynamically tuning gradient magnitudes.”

Conclusions
We introduced GradNorm, an efficient algorithm for tuning loss weights in a multi-task learning setting based on balancing the training rates of different tasks. We demonstrated on both synthetic and real datasets that GradNorm improves multitask test-time performance in a variety of
scenarios, and can accommodate various levels of asymmetry amongst the different tasks through the hyperparameter α. Our empirical results indicate that GradNorm offers superior performance over state-of-the-art multitask adaptive weighting methods and can match or surpass the performance of exhaustive grid search while being significantly
less time-intensive.

Looking ahead, algorithms such as GradNorm may have applications beyond multitask learning. We hope to extend the GradNorm approach to work with class-balancing and sequence-to-sequence models, all situations where problems with conflicting gradient signals can degrade model performance. We thus believe that our work not only provides a robust new algorithm for multitask learning, but also reinforces the powerful idea that gradient tuning is fundamental for training large, effective models on complex tasks.

The paper derived the formulation of the multitask loss based on the maximization of the Gaussian likelihood with homoscedastic* uncertainty. I will not go to the details here, but the simplified forms are strikingly simple.

Image for post
Modified weight based on task uncertainty

(^ https://towardsdatascience.com/self-paced-multitask-learning-76c26e9532d0)

*

In statistics, a sequence (or a vector) of random variables is homoscedastic /ˌhoʊmoʊskəˈdæstɪk/ if all its random variables have the same finite variance. This is also known as homogeneity of variance. The complementary notion is called heteroscedasticity.

Categories
dev Linux

Experiment Analysis and Wheels

The Ray framework has this Analysis class (and Experiment Analysis class), and I noticed the code was kinda buggy, because it should have handled episode_reward_mean being NaN, better https://github.com/ray-project/ray/issues/9826 (episode_reward_mean is an averaged value, so appears as NaN (Not a Number) for the first few rollouts) . It was fixed a mere 18 days ago, so I can download the nightly release instead.

# pip install -U https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-0.9.0.dev0-cp37-cp37m-manylinux1_x86_64.whl

or, since I still have 3.6 installed,

# pip install -U https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-0.9.0.dev0-cp36-cp36m-manylinux1_x86_64.whl

Successfully installed aioredis-1.3.1 blessings-1.7 cachetools-4.1.1 colorful-0.5.4 contextvars-2.4 google-api-core-1.22.0 google-auth-1.20.0 googleapis-common-protos-1.52.0 gpustat-0.6.0 hiredis-1.1.0 immutables-0.14 nvidia-ml-py3-7.352.0 opencensus-0.7.10 opencensus-context-0.1.1 pyasn1-0.4.8 pyasn1-modules-0.2.8 ray-0.9.0.dev0 rsa-4.6

https://github.com/openai/gym/issues/1153

Phew. Wheels. Better than eggs, old greybeard probably said. https://pythonwheels.com/

What are wheels?

Wheels are the new standard of Python distribution and are intended to replace eggs.

Categories
dev

Ray/RLLib PBT & ARS

I’ve posted an issue to try get ARS working. https://github.com/ray-project/ray/issues/9573

  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/agents/ars/ars_tf_policy.py", line 59, in compute_actions
    observation = self.observation_filter(observation[None], update=update)
TypeError: list indices must be integers or slices, not NoneType

No idea yet, but some other bug mentioned something about numpy arrays vs. (good old) arrays. But anyhow, would be great if I can get ARS working on Ray/RLLib, because I just get the sense that PPO is too dumb. It’s never managed to get past falling over, with quite a bit a hyperparam tweaking.

At least ARS has evolved a walking table, so far. When it works in Ray, perhaps we will have the policy save and load, and I can move onto replaying experiences, or continuing training at a checkpoint, etc.

Huh great, well I solved my problems. and it’s running something now.

But rollouts are not ending now. Ok it looks like I need to put a time limit for the environment in the environment, rather than it being a hyperparameter like in pyBullet’s implementation.

Well then, onto the next issue.

Categories
deep dev

Ray / RLLib PBT & PPO

Got Population Based Training PPO running in RLLib. It seems to have maxed out rewards. (Asymptotically approaching 0.74).

PPO isn’t great for this. But let’s see if we can replay with GUI after this.

I asked for these

hyperparam_mutations={
    "lambda": lambda: random.uniform(0.9, 1.0),
    "clip_param": lambda: random.uniform(0.01, 0.5),
    "lr": [1e-3, 5e-4, 1e-4, 5e-5, 1e-5]
    # ,
    # "num_sgd_iter": lambda: random.randint(1, 30),
    # "sgd_minibatch_size": lambda: random.randint(128, 16384),
    # "train_batch_size": lambda: random.randint(2000, 160000),
})

cat pbt_global.txt
["5", "7", 17, 18, {"env": "RobotableEnv-v0", "kl_coeff": 1.0, "num_workers": 2, "num_gpus": 0, "model": {"free_log_std": true}, "lambda": 0.95, "clip_param": 0.2, "lr": 0.0001, "num_sgd_iter": 20, "sgd_minibatch_size": 500, "train_batch_size": 10000}, {"env": "RobotableEnv-v0", "kl_coeff": 1.0, "num_workers": 2, "num_gpus": 0, "model": {"free_log_std": true}, "lambda": 0.76, "clip_param": 0.16000000000000003, "lr": 5e-05, "num_sgd_iter": 20, "sgd_minibatch_size": 500, "train_batch_size": 10000}]

["3", "1", 35, 32, {"env": "RobotableEnv-v0", "kl_coeff": 1.0, "num_workers": 2, "num_gpus": 0, "model": {"free_log_std": true}, "lambda": 0.95, "clip_param": 0.2, "lr": 0.0001, "num_sgd_iter": 20, "sgd_minibatch_size": 500, "train_batch_size": 10000}, {"env": "RobotableEnv-v0", "kl_coeff": 1.0, "num_workers": 2, "num_gpus": 0, "model": {"free_log_std": true}, "lambda": 1.14, "clip_param": 0.1096797541550122, "lr": 5e-05, "num_sgd_iter": 20, "sgd_minibatch_size": 500, "train_batch_size": 10000}]

["3", "7", 35, 36, {"env": "RobotableEnv-v0", "kl_coeff": 1.0, "num_workers": 2, "num_gpus": 0, "model": {"free_log_std": true}, "lambda": 0.95, "clip_param": 0.2, "lr": 0.0001, "num_sgd_iter": 20, "sgd_minibatch_size": 500, "train_batch_size": 10000}, {"env": "RobotableEnv-v0", "kl_coeff": 1.0, "num_workers": 2, "num_gpus": 0, "model": {"free_log_std": true}, "lambda": 0.76, "clip_param": 0.24, "lr": 0.001, "num_sgd_iter": 20, "sgd_minibatch_size": 500, "train_batch_size": 10000}]

["5", "6", 37, 35, {"env": "RobotableEnv-v0", "kl_coeff": 1.0, "num_workers": 2, "num_gpus": 0, "model": {"free_log_std": true}, "lambda": 0.95, "clip_param": 0.2, "lr": 0.0001, "num_sgd_iter": 20, "sgd_minibatch_size": 500, "train_batch_size": 10000}, {"env": "RobotableEnv-v0", "kl_coeff": 1.0, "num_workers": 2, "num_gpus": 0, "model": {"free_log_std": true}, "lambda": 1.14, "clip_param": 0.16000000000000003, "lr": 5e-05, "num_sgd_iter": 20, "sgd_minibatch_size": 500, "train_batch_size": 10000}]




== Status ==
Memory usage on this node: 2.7/3.8 GiB
PopulationBasedTraining: 28 checkpoints, 3 perturbs
Resources requested: 3/4 CPUs, 0/0 GPUs, 0.0/0.93 GiB heap, 0.0/0.29 GiB objects
Result logdir: /root/ray_results/PBT_ROBOTABLE
Number of trials: 8 (7 PAUSED, 1 RUNNING)
+---------------------------------+----------+-----------------------+--------+------------------+--------+----------+
| Trial name                      | status   | loc                   |   iter |   total time (s) |     ts |   reward |
|---------------------------------+----------+-----------------------+--------+------------------+--------+----------|
| PPO_RobotableEnv-v0_c67a8_00000 | PAUSED   |                       |     36 |         1069.1   | 360000 | 0.735323 |
| PPO_RobotableEnv-v0_c67a8_00001 | PAUSED   |                       |     36 |         1096.3   | 360000 | 0.736305 |
| PPO_RobotableEnv-v0_c67a8_00002 | PAUSED   |                       |     33 |          987.687 | 330000 | 0.735262 |
| PPO_RobotableEnv-v0_c67a8_00003 | PAUSED   |                       |     36 |         1096.22  | 360000 | 0.731993 |
| PPO_RobotableEnv-v0_c67a8_00004 | PAUSED   |                       |     37 |         1103.48  | 370000 | 0.739188 |
| PPO_RobotableEnv-v0_c67a8_00005 | RUNNING  | 192.168.101.127:14690 |     37 |         1101.5   | 370000 | 0.727506 |
| PPO_RobotableEnv-v0_c67a8_00006 | PAUSED   |                       |     35 |         1067.26  | 350000 | 0.739985 |
| PPO_RobotableEnv-v0_c67a8_00007 | PAUSED   |                       |     36 |         1085.05  | 360000 | 0.739295 |
+---------------------------------+----------+-----------------------+--------+------------------+--------+----------+


2020-07-19 17:27:53,966	INFO pbt.py:78 -- [explore] perturbed config from {'env': 'RobotableEnv-v0', 'kl_coeff': 1.0, 'num_workers': 2, 'num_gpus': 0, 'model': {'free_log_std': True}, 'lambda': 0.95, 'clip_param': 0.2, 'lr': 0.0001, 'num_sgd_iter': 20, 'sgd_minibatch_size': 500, 'train_batch_size': 10000} -> {'env': 'RobotableEnv-v0', 'kl_coeff': 1.0, 'num_workers': 2, 'num_gpus': 0, 'model': {'free_log_std': True}, 'lambda': 1.14, 'clip_param': 0.16000000000000003, 'lr': 5e-05, 'num_sgd_iter': 20, 'sgd_minibatch_size': 500, 'train_batch_size': 10000}
2020-07-19 17:27:53,966	INFO pbt.py:316 -- [exploit] transferring weights from trial PPO_RobotableEnv-v0_c67a8_00006 (score 0.7399848299949074) -> PPO_RobotableEnv-v0_c67a8_00005 (score 0.7241841897925536)
Result for PPO_RobotableEnv-v0_c67a8_00005:
  custom_metrics: {}
  date: 2020-07-19_17-27-53
  done: false
  episode_len_mean: 114.58
  episode_reward_max: 0.7808001167724908
  episode_reward_mean: 0.7241841897925536
  episode_reward_min: 0.6627154081217708
  episodes_this_iter: 88
  episodes_total: 2500
  experiment_id: e3408f32ed2a433d8c7edb87d33609ba
  experiment_tag: 5@perturbed[clip_param=0.16,lambda=1.14,lr=5e-05]
  hostname: chrx
  info:
    learner:
      default_policy:
        cur_kl_coeff: 0.0625
        cur_lr: 4.999999873689376e-05
        entropy: 5.101933479309082
        entropy_coeff: 0.0
        kl: 0.004210006445646286
        model: {}
        policy_loss: -0.0077978381887078285
        total_loss: -0.007088268641382456
        vf_explained_var: 0.9757658243179321
        vf_loss: 0.0004464423400349915
    num_steps_sampled: 380000
    num_steps_trained: 380000
  iterations_since_restore: 5
  node_ip: 192.168.101.127
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf:
    cpu_util_percent: 66.7095238095238
    ram_util_percent: 72.5452380952381
  pid: 14690
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_env_wait_ms: 1.5935033550679747
    mean_inference_ms: 1.8385610163959398
    mean_processing_ms: 1.195529456155168
  time_since_restore: 147.82027745246887
  time_this_iter_s: 29.546902656555176
  time_total_s: 1131.04909491539
  timers:
    learn_throughput: 1880.23
    learn_time_ms: 5318.497
    load_throughput: 350730.091
    load_time_ms: 28.512
    sample_throughput: 414.501
    sample_time_ms: 24125.418
    update_time_ms: 4.191
  timestamp: 1595179673
  timesteps_since_restore: 0
  timesteps_total: 380000
  training_iteration: 38
  trial_id: c67a8_00005
  
2020-07-19 17:27:54,989	WARNING util.py:137 -- The `experiment_checkpoint` operation took 0.8819785118103027 seconds to complete, which may be a performance bottleneck.
== Status ==
Memory usage on this node: 2.6/3.8 GiB
PopulationBasedTraining: 28 checkpoints, 4 perturbs
Resources requested: 0/4 CPUs, 0/0 GPUs, 0.0/0.93 GiB heap, 0.0/0.29 GiB objects
Result logdir: /root/ray_results/PBT_ROBOTABLE
Number of trials: 8 (8 PAUSED)
+---------------------------------+----------+-------+--------+------------------+--------+----------+
| Trial name                      | status   | loc   |   iter |   total time (s) |     ts |   reward |
|---------------------------------+----------+-------+--------+------------------+--------+----------|
| PPO_RobotableEnv-v0_c67a8_00000 | PAUSED   |       |     36 |         1069.1   | 360000 | 0.735323 |
| PPO_RobotableEnv-v0_c67a8_00001 | PAUSED   |       |     36 |         1096.3   | 360000 | 0.736305 |
| PPO_RobotableEnv-v0_c67a8_00002 | PAUSED   |       |     33 |          987.687 | 330000 | 0.735262 |
| PPO_RobotableEnv-v0_c67a8_00003 | PAUSED   |       |     36 |         1096.22  | 360000 | 0.731993 |
| PPO_RobotableEnv-v0_c67a8_00004 | PAUSED   |       |     37 |         1103.48  | 370000 | 0.739188 |
| PPO_RobotableEnv-v0_c67a8_00005 | PAUSED   |       |     38 |         1131.05  | 380000 | 0.724184 |
| PPO_RobotableEnv-v0_c67a8_00006 | PAUSED   |       |     35 |         1067.26  | 350000 | 0.739985 |
| PPO_RobotableEnv-v0_c67a8_00007 | PAUSED   |       |     36 |         1085.05  | 360000 | 0.739295 |
+---------------------------------+----------+-------+--------+------------------+--------+----------+


(pid=14800) 2020-07-19 17:27:58,611	INFO trainer.py:585 -- Tip: set framework=tfe or the --eager flag to enable TensorFlow eager execution
(pid=14800) 2020-07-19 17:27:58,611	INFO trainer.py:612 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
(pid=14800) pybullet build time: Mar 17 2020 17:46:41
(pid=14800) /usr/local/lib/python3.6/dist-packages/gym/logger.py:30: UserWarning: WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.
(pid=14800)   warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
(pid=14913) pybullet build time: Mar 17 2020 17:46:41
(pid=14913) 2020-07-19 17:28:00,118	INFO trainer.py:585 -- Tip: set framework=tfe or the --eager flag to enable TensorFlow eager execution
(pid=14913) 2020-07-19 17:28:00,118	INFO trainer.py:612 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
(pid=14913) /usr/local/lib/python3.6/dist-packages/gym/logger.py:30: UserWarning: WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.
(pid=14913)   warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
(pid=14992) pybullet build time: Mar 17 2020 17:46:41
(pid=14993) pybullet build time: Mar 17 2020 17:46:41
(pid=14992) /usr/local/lib/python3.6/dist-packages/gym/logger.py:30: UserWarning: WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.
(pid=14992)   warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
(pid=14800) 2020-07-19 17:28:10,106	INFO trainable.py:181 -- _setup took 11.510 seconds. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
(pid=14993) /usr/local/lib/python3.6/dist-packages/gym/logger.py:30: UserWarning: WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.
(pid=14993)   warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
(pid=14800) 2020-07-19 17:28:10,126	WARNING util.py:37 -- Install gputil for GPU system monitoring.
(pid=14800) 2020-07-19 17:28:10,717	INFO trainable.py:423 -- Restored on 192.168.101.127 from checkpoint: /root/ray_results/PBT_ROBOTABLE/PPO_RobotableEnv-v0_5_2020-07-19_15-00-03bbqeih3t/tmpf1h5txefrestore_from_object/checkpoint-35
(pid=14800) 2020-07-19 17:28:10,717	INFO trainable.py:430 -- Current state after restoring: {'_iteration': 35, '_timesteps_total': None, '_time_total': 1067.2641203403473, '_episodes_total': 2289}
(pid=14913) 2020-07-19 17:28:12,388	INFO trainable.py:181 -- _setup took 12.284 seconds. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
(pid=14913) 2020-07-19 17:28:12,388	WARNING util.py:37 -- Install gputil for GPU system monitoring.
(pid=14913) 2020-07-19 17:28:12,760	INFO trainable.py:423 -- Restored on 192.168.101.127 from checkpoint: /root/ray_results/PBT_ROBOTABLE/PPO_RobotableEnv-v0_2_2020-07-19_14-52-33cutk2k27/tmplqac6svyrestore_from_object/checkpoint-33
(pid=14913) 2020-07-19 17:28:12,760	INFO trainable.py:430 -- Current state after restoring: {'_iteration': 33, '_timesteps_total': None, '_time_total': 987.687007188797, '_episodes_total': 2059}
(pid=15001) pybullet build time: Mar 17 2020 17:46:41
(pid=15001) /usr/local/lib/python3.6/dist-packages/gym/logger.py:30: UserWarning: WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.
(pid=15001)   warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
(pid=15088) pybullet build time: Mar 17 2020 17:46:41
(pid=15088) /usr/local/lib/python3.6/dist-packages/gym/logger.py:30: UserWarning: WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.
(pid=15088)   warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
Result for PPO_RobotableEnv-v0_c67a8_00002:
  custom_metrics: {}
  date: 2020-07-19_17-28-54
  done: false
  episode_len_mean: 110.78888888888889
  episode_reward_max: 0.8009732276880979
  episode_reward_mean: 0.7387077080695522
  episode_reward_min: 0.6640543988817607
  episodes_this_iter: 90
  episodes_total: 2149
  experiment_id: edcd859a3ae34d668bb9be1899dde41a
  experiment_tag: '2'
  hostname: chrx
  info:
    learner:
      default_policy:
        cur_kl_coeff: 1.0
        cur_lr: 9.999999747378752e-05
        entropy: 5.111008644104004
        entropy_coeff: 0.0
        kl: 0.0031687873415648937
        model: {}
        policy_loss: -0.012367220595479012
        total_loss: -0.008663905784487724
        vf_explained_var: 0.9726411700248718
        vf_loss: 0.0005345290992408991
    num_steps_sampled: 340000
    num_steps_trained: 340000
  iterations_since_restore: 1
  node_ip: 192.168.101.127
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf:
    cpu_util_percent: 68.11833333333333
    ram_util_percent: 71.13666666666667
  pid: 14913
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_env_wait_ms: 1.6718764134441182
    mean_inference_ms: 1.9752634594235934
    mean_processing_ms: 1.2958259778937158
  time_since_restore: 41.650487661361694
  time_this_iter_s: 41.650487661361694
  time_total_s: 1029.3374948501587
  timers:
    learn_throughput: 1680.106
    learn_time_ms: 5952.007
    load_throughput: 74973.795
    load_time_ms: 133.38
    sample_throughput: 285.094
    sample_time_ms: 35076.171
    update_time_ms: 4.517
  timestamp: 1595179734
  timesteps_since_restore: 0
  timesteps_total: 340000
  training_iteration: 34
  trial_id: c67a8_00002
  
2020-07-19 17:28:55,042	WARNING util.py:137 -- The `experiment_checkpoint` operation took 0.5836038589477539 seconds to complete, which may be a performance bottleneck.
== Status ==
Memory usage on this node: 2.7/3.8 GiB
PopulationBasedTraining: 28 checkpoints, 4 perturbs
Resources requested: 3/4 CPUs, 0/0 GPUs, 0.0/0.93 GiB heap, 0.0/0.29 GiB objects
Result logdir: /root/ray_results/PBT_ROBOTABLE
Number of trials: 8 (7 PAUSED, 1 RUNNING)
+---------------------------------+----------+-----------------------+--------+------------------+--------+----------+
| Trial name                      | status   | loc                   |   iter |   total time (s) |     ts |   reward |
|---------------------------------+----------+-----------------------+--------+------------------+--------+----------|
| PPO_RobotableEnv-v0_c67a8_00000 | PAUSED   |                       |     36 |          1069.1  | 360000 | 0.735323 |
| PPO_RobotableEnv-v0_c67a8_00001 | PAUSED   |                       |     36 |          1096.3  | 360000 | 0.736305 |
| PPO_RobotableEnv-v0_c67a8_00002 | RUNNING  | 192.168.101.127:14913 |     34 |          1029.34 | 340000 | 0.738708 |
| PPO_RobotableEnv-v0_c67a8_00003 | PAUSED   |                       |     36 |          1096.22 | 360000 | 0.731993 |
| PPO_RobotableEnv-v0_c67a8_00004 | PAUSED   |                       |     37 |          1103.48 | 370000 | 0.739188 |
| PPO_RobotableEnv-v0_c67a8_00005 | PAUSED   |                       |     38 |          1131.05 | 380000 | 0.724184 |
| PPO_RobotableEnv-v0_c67a8_00006 | PAUSED   |                       |     35 |          1067.26 | 350000 | 0.739985 |
| PPO_RobotableEnv-v0_c67a8_00007 | PAUSED   |                       |     36 |          1085.05 | 360000 | 0.739295 |
+---------------------------------+----------+-----------------------+--------+------------------+--------+----------+

Categories
dev institutes

Anyscale Academy

https://github.com/anyscale/academy

They have a relevant tutorial on RLLib (Ray)

Categories
AI/ML dev Locomotion simulation

Replay

After seeing the ‘Replay Buffer’ in The TF-Agents SAC minitaur https://www.tensorflow.org/agents/tutorials/7_SAC_minitaur_tutorial tutorial, I’m starting to think replay is going to be a thing for the robot, one way or another.

I’m sticking to the google protobuf code that the minitaur uses, and will just need to save the best episodes, and work out how to replay them. The comments ask “use recordio?”

https://stackoverflow.com/questions/53219720/tfrecord-vs-recordio

import os
import inspect

currentdir = os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe())))
parentdir = os.path.dirname(os.path.dirname(currentdir))
os.sys.path.insert(0, parentdir)

import argparse
from gym_robotable.envs import logging

if __name__ == "__main__":

    parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
    parser.add_argument('--log_file', help='path to protobuf file', default='/opt/gym-robotable/logs/robotable_log_2020-07-12-204613')
    args = parser.parse_args()
    logging = logging.RobotableLogging()
    episode = logging.restore_episode(args.log_file)
    print(dir (episode))
    print("episode=",episode)
    fields = episode.ListFields()

So that’s printing out the episode.

Next step is saving only the best episodes

Then next step is stepping the simulation with the actions stored.

But now I’m not as sure. It might be better to switch to RLLAB (&Ray)

Would rather hide the details of serialization if I can.