Categories
AI/ML CNNs deep Locomotion simulation Vision

Simulation Vision 2

I’ve now got a UNet that can provide predictions for where an egg is, in simulation.

So I want to design a reward function related to the egg prediction mask.

I haven’t ‘plugged in’ the trained neural network though, because it will slow things down, and I can just as well make use of the built-in pybullet segmentation to get the simulation egg pixels. At some point though, the robot will have to exist in a world where egg pixels are not labelled as such, and the simulation trained vision will be a useful basis for training.

I think a good reward function might be, (to not fall over), and to maximize the number of 1s for the egg prediction mask. An intermediate award might be the centering of egg pixels.

The numpy way to count mask pixels could be

arr = np.array([1, 0, 0, 0, 0, 1, 1, 1, 1, 0])
np.count_nonzero(arr == 1)

I ended up using the following to count the pixels:

    seg = Image.fromarray(mask.astype('uint8'))
    self._num_ones = (np.array(seg) == 1).sum()

Hmm for centering, not sure yet.

I’m looking into how to run pybullet / gym on the cloud and get some of it rendering.

I’ve found a few leads. VNC is an obvious solution, but probably won’t be available on Chrome OS. Pybullet has a broken link, but I think it’s suggesting something like this colab, more or less, using ‘pyrender’. User matpalm has a minimal example of sending images to Google Dataflow. Those might be good if I can render video. There’s a Jupyter example with capturing images in pybullet. I’ll have to research a bit more. An RDP viewer would probably be easiest, if it’s possible.

Some interesting options on stackoverflow, too.

I set up the Ray Tune training again, on google cloud, and enabled the dashboard by opening some ports (8265, and 6006), and initialising ray with ray.init(dashboard_host=”0.0.0.0″)

I can see it improving the episode reward mean, but it’s taking a good while on the 4 CPU cloud machine. Cost is about $3.50/day on the CPU machine, and about $16/day on the GPU machine. Google is out of T4 GPUs at the moment.

I have it saving the occasional mp4 video using a Monitor wrapper that records every 10th episode.

def env_creator(env_config):
    env = RobotableEnv()
    env = gym.wrappers.Monitor(env, "./vid", video_callable=lambda episode_id: episode_id%10==0,force=True)
    return env

After one night of training, it went from about -30 reward to -5 reward. I’m just running it on the CPU machine while I iron out the issues.

I think curriculum training might also be a useful addition.