Categories
control dev envs hardware_ robots UI Vision

Slamtec RPLidar

I got the RPLidar A1M8-R6 with firmware 1.29, and at first, it was just plastic spinning around, and none of the libraries worked.

But I got it working on Windows, as a sanity check, so it wasn’t broken. So I started again on getting it working on the Raspberry Pi Zero W.

Tried the Adafruit python libs, but v1.29 had some insurmountable issue, and I couldn’t downgrade to v1.27.

So needed to compile the Slamtec SDK.

A helpful post pointed out how to fix the compile error and I was able to compile.

I soldered on some extra wires to the motor + and -, to power the motor separately.

Wasn’t getting any luck, but it turned out to be the MicroUSB cable (The OTG cable was ok). After swapping it out, I was able to run the simple_grabber app and confirm that data was coming out.

pi@raspberrypi:~/rplidar_sdk/output/Linux/Release $ ./simple_grabber --channel --serial /dev/ttyUSB0 115200
theta: 59.23 Dist: 00160.00
theta: 59.50 Dist: 00161.00
theta: 59.77 Dist: 00162.00
theta: 59.98 Dist: 00164.00
theta: 60.29 Dist: 00165.00
theta: 61.11 Dist: 00168.00

I debugged the Adafruit v1.29 issue too. So now I’m able to get the data in python, which will probably be nicer to work with, as I haven’t done proper C++ in like 20 years. But this Slamtec code would be the cleanest example to work with.

So I added in some C socket code and recompiled, so now the demo app takes a TCP connection and starts dumping data.

./ultra_simple --channel --serial /dev/ttyUSB0 115200

It was actually A LOT faster than the python libraries. But I started getting ECONNREFUSED errors, which I thought might be because the Pi Zero W only has a single CPU, and the Python WSGI worker engine was eventlet, which only handles 1 worker, for flask-socketio, and running a socket server, client, and socket-io, on a single CPU, was creating some sort of resource contention. But I couldn’t solve it.

I found a C++ python-wrapped project but it was compiled for 64 bit, and the software, SWIG, which I needed to recompile for 32 bit, seemed a bit complicated.

So, back to Python.

Actually, back to javascript, to get some visuals in a browser. The Adafruit example is for pygame, but we’re over a network, so that won’t work. Rendering Matplotlib graphs is going to be too slow. Need to stream data, and render it on the front end.

Detour #1: NPM

Ok… so, need to install Node.js to install this one, which for Raspberry Pi Zero W, is ARM6.

This is the most recent ARM6 nodejs tarball:

wget https://nodejs.org/dist/latest-v11.x/node-v11.15.0-linux-armv6l.tar.gz

tar xzvf node-v11.15.0-linux-armv6l.tar.gz
cd node-v11.15.0-linux-armv6l
sudo cp -R * /usr/local/
sudo ldconfig
npm install --global yarn
sudo npm install --global yarn

npm install rplidar

npm ERR! serialport@4.0.1 install: `node-pre-gyp install --fallback-to-build`
 
Ok...  never mind javascript for now.

Detour #2: Dash/Plotly

Let’s try this python code. https://github.com/Hyun-je/pyrplidar

Ok well it looks like it works maybe, but where is s/he getting that nice plot from? Not in the code. I want the plot.

So, theta and distance are just polar coordinates. So I need to plot polar coordinates.

PolarToCartesian.

Convert a polar coordinate (r,θ) to cartesian (x,y): x = r cos(θ), y = r sin(θ)

Ok that is easy, right? So here’s a javascript library with a polar coordinate plotter

So, plan is, set up a flask route, read RPLidar data, publish to a front end, which plots it in javascript

Ok after some googling, Dash / Plotly looks like a decent option.

Found this code. Cool project! And though this guy used a different Lidar, it’s pretty much what I’m trying to do, and he’s using plotly.

pip3 install pandas
pip3 install dash

k let's try...
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 48 from C header, got 40 from PyObject

ok
pip3 install --upgrade numpy     
(if your numpy version is < 1.20.0)

ok now  bad marshal data (unknown type code)
sheesh, what garbage.  
Posting issue to their github and going back to the plan.

Reply from Plotly devs: pip3 won’t work, will need to try conda install, for ARM6

Ok let’s see if we can install plotly again….

Going to try miniconda – they have a arm6 file here

Damn. 2014. Python 2. Nope. Ok Plotly is not an option for RPi Zero W. I could swap to another RPi, but I don’t think the 1A output of the power bank can handle it, plus camera, plus lidar motor, and laser. (I am using the 2.1A output for the servos).

Solution #1: D3.js

Ok, Just noting this link, as it looks useful for the lidar robot, later.

So, let’s install socket io and websockets

pip3 install flask_socketio
pip3 install simple-websocket
pip3 install flask-executor

(looking at this link) for flask and socket-io, and this link for d3 polar chart

app isn’t starting though, since adding socket-io So, hmm. Ok, this issue. Right, needs 0.0.0.0.

socketio.run(app, debug=True, host='0.0.0.0')

Back to it…

K. Let’s carry on with Flask/d3.js though.

I think if we’re doing threading, I need to use a WSGI server.

pip install waitress

ok that won’t work with flask-socketio. Needs gevent or eventlet.

eventlet is the best performant option, with support for long-polling and WebSocket transports.”

apparently needs redis for message queueing…

pip install eventlet
pip install redis

Ok, and we need gunicorn, because eventlet is just for workers...

pip3 install gunicorn

gunicorn --worker-class eventlet -w 1 module:app

k, that throws an error.
I need to downgrade eventlet, or do some complicated thing.
pip install eventlet==0.30.2

gunicorn --bind 0.0.0.0 --worker-class eventlet -w 1 kmp8servo:app
(my service is called kmp8servo.py)


ok so do i need redis?
sudo apt-get install redis
ok it's already running now, 
at /usr/bin/redis-server 127.0.0.1:6379
no, i don't really need redis.  Could use sqlite, too. But let's use it anyway.

Ok amazing, gunicorn works.  It's running on port 8000

Ok, after some work,  socket-io is also working.

Received #0: Connected
Received #1: I'm connected!
Received #2: Server generated event
Received #3: Server generated event
Received #4: Server generated event
Received #5: Server generated event
Received #6: Server generated event
Received #7: Server generated event

So, I’m going to go with d3.js instead of P5js, just cause it’s got a zillion more users, and there’s plenty of polar coordinate code to look at, too.

Got it drawing the polar background… but I gotta change the scale a bit. The code uses a linear scale from 0 to 1, so I need to get my distances down to something between 0 and 1. Also need radians, instead of the degrees that the lidar is putting out.

ok finally. what an ordeal.

But now we still need to get python lidar code working though, or switch back to the C socket code I got working.

Ok, well, so I added D3 update code with transitions, and the javascript looks great.

But the C Slamtec SDK, and the Python RP Lidar wrappers are a source of pain.

I had the C sockets working briefly, but it stopped working, seemingly while I added more Python code between each socket read. I got frustrated and gave up.

The Adafruit library, with the fixes I made, seem to work now, but it’s in a very precarious state, where looking at it funny causes a bad descriptor field, or checksum error.

But I managed to get the brain turning on, with the lidar. I’m using Redis to track the variables, using the memory.py code from this K9 repo. Thanks.

I will come back to trying to fix the remaining python library issues, but for now, the robot is running, so, on to the next.

Categories
AI/ML CNNs dev Locomotion OpenCV robots UI Vision

Realsense Depth and TensorRT object detection

A seemingly straightforward idea for robot control involves using depth, and object detection, to form a rough model of the environment.

After failed attempts to create our own stereo camera using two monocular cameras, we eventually decided to buy a commercial product instead, The Intel depth camera, D455.

After a first round of running a COCO trained MobileSSDv2 object detection network in Tensorflow 2 Lite, on the colour images obtained from the realsense camera, on the Jetson Nano, the results were just barely acceptable (~2 FPS) for a localhost stream, and totally unacceptable (~0.25 FPS) served as JPEG over HTTP, to a browser on the network.

Looking at the options, the only feasible solution was to redo the network using TensorRT, the NVIDIA-specific, quantized (16 bit on the Nano, 8 bit on the NX/AGX) neural network framework. The other solution involved investigating options other than simple JPEG compression over HTTP, such as RTSP and WebRTC.

The difficult part was setting up the environment. We used the NVIDIA detectnet code, adapted to take the realsense camera images as input, and to display the distance to the objects. An outdated example was found at CAVEDU robotics blog/github. Fixed up below.

#!/usr/bin/python3



import jetson_inference
import jetson_utils
import argparse
import sys
import os
import cv2
import re
import numpy as np
import io
import time
import json
import random
import pyrealsense2 as rs
from jetson_inference import detectNet
from jetson_utils import videoSource, videoOutput, logUsage, cudaFromNumpy, cudaAllocMapped, cudaConvertColor

parser = argparse.ArgumentParser(description="Locate objects in a live camera stream using an object detection DNN.",
formatter_class=argparse.RawTextHelpFormatter, epilog=jetson_utils.logUsage())
parser.add_argument("--network", type=str, default="ssd-mobilenet-v2",
help="pre-trained model to load (see below for options)")
parser.add_argument("--threshold", type=float, default=0.5,
help="minimum detection threshold to use")
parser.add_argument("--width", type=int, default=640,
help="set width for image")
parser.add_argument("--height", type=int, default=480,
help="set height for image")
opt = parser.parse_known_args()[0]

# load the object detection network
net = detectNet(opt.network, sys.argv, opt.threshold)

# Configure depth and color streams
pipeline = rs.pipeline()
config = rs.config()
config.enable_stream(rs.stream.depth, opt.width, opt.height, rs.format.z16, 30)
config.enable_stream(rs.stream.color, opt.width, opt.height, rs.format.bgr8, 30)
# Start streaming
pipeline.start(config)


press_key = 0
while (press_key==0):
	# Wait for a coherent pair of frames: depth and color
	frames = pipeline.wait_for_frames()
	depth_frame = frames.get_depth_frame()
	color_frame = frames.get_color_frame()
	if not depth_frame or not color_frame:
		continue
	# Convert images to numpy arrays
	depth_image = np.asanyarray(depth_frame.get_data())
	show_img = np.asanyarray(color_frame.get_data())
	
	# convert to CUDA (cv2 images are numpy arrays, in BGR format)
	bgr_img = cudaFromNumpy(show_img, isBGR=True)
	# convert from BGR -> RGB
	img = cudaAllocMapped(width=bgr_img.width,height=bgr_img.height,format='rgb8')
	cudaConvertColor(bgr_img, img)

	# detect objects in the image (with overlay)
	detections = net.Detect(img)

	for num in range(len(detections)) :
		score = round(detections[num].Confidence,2)
		box_top=int(detections[num].Top)
		box_left=int(detections[num].Left)
		box_bottom=int(detections[num].Bottom)
		box_right=int(detections[num].Right)
		box_center=detections[num].Center
		label_name = net.GetClassDesc(detections[num].ClassID)

		point_distance=0.0
		for i in range (10):
			point_distance = point_distance + depth_frame.get_distance(int(box_center[0]),int(box_center[1]))

		point_distance = np.round(point_distance / 10, 3)
		distance_text = str(point_distance) + 'm'
		cv2.rectangle(show_img,(box_left,box_top),(box_right,box_bottom),(255,0,0),2)
		cv2.line(show_img,
			(int(box_center[0])-10, int(box_center[1])),
			(int(box_center[0]+10), int(box_center[1])),
			(0, 255, 255), 3)
		cv2.line(show_img,
			(int(box_center[0]), int(box_center[1]-10)),
			(int(box_center[0]), int(box_center[1]+10)),
			(0, 255, 255), 3)
		cv2.putText(show_img,
			label_name + ' ' + distance_text,
			(box_left+5,box_top+20),cv2.FONT_HERSHEY_SIMPLEX,0.4,
			(0,255,255),1,cv2.LINE_AA)

	cv2.putText(show_img,
		"{:.0f} FPS".format(net.GetNetworkFPS()),
		(int(opt.width*0.8), int(opt.height*0.1)),
		cv2.FONT_HERSHEY_SIMPLEX,1,
		(0,255,255),2,cv2.LINE_AA)


	display = cv2.resize(show_img,(int(opt.width*1.5),int(opt.height*1.5)))
	cv2.imshow('Detecting...',display)
	keyValue=cv2.waitKey(1)
	if keyValue & 0xFF == ord('q'):
		press_key=1


cv2.destroyAllWindows()
pipeline.stop()

Assuming you have a good cmake version and cuda is available (if nvcc doesn’t work, you need to configure linker paths, check this link)… (if you have a cmake version around ~ 3.22-3.24 or so, you need an older one)… prerequisite sudo apt-get install libssl-dev also required.

The hard part was actually setting up the Realsense python bindings.

Clone the repo…

git clone https://github.com/IntelRealSense/librealsense.git

The trick being, to request the python bindings, and cuda, during the cmake phase. Note that often, none of this works. Some tips include…

sudo apt-get install xorg-dev libglu1-mesa-dev

and changing PYTHON to Python

mkdir build
cd build
cmake ../ -DBUILD_PYTHON_BINDINGS:bool=true -DPYTHON_EXECUTABLE=/usr/bin/python3 -DCMAKE_BUILD_TYPE=release -DBUILD_EXAMPLES=true -DBUILD_GRAPHICAL_EXAMPLES=true -DBUILD_WITH_CUDA:bool=true

The above worked on Jetpack 4.6.1, while the below worked on Jetpack 5.0.2

cmake ../ -DBUILD_PYTHON_BINDINGS:bool=true -DPython_EXECUTABLE=/usr/bin/python3.8 -DCMAKE_BUILD_TYPE=release -DBUILD_EXAMPLES=true -DBUILD_GRAPHICAL_EXAMPLES=true -DBUILD_WITH_CUDA:bool=true -DPYTHON_INCLUDE_DIRS=/usr/include/python3.8 -DPython_LIBRARIES=/usr/lib/aarch64-linux-gnu/libpython3.8.so

(and sudo make install)
Update the python path

export PYTHONPATH=$PYTHONPATH:/usr/local/lib
(or a specific python if you have more than one)
if installed in /usr/lib, change accordingly

Check that the folder is in the correct location (it isn't, after following official instructions).

./usr/local/lib/python3.6/dist-packages/pyrealsense2/

Check that the shared object files (.so) are in the right place: 

chicken@chicken:/usr/local/lib$ ls
cmake       libjetson-inference.so  librealsense2-gl.so.2.50    librealsense2.so.2.50    pkgconfig
libfw.a     libjetson-utils.so      librealsense2-gl.so.2.50.0  librealsense2.so.2.50.0  python2.7
libglfw3.a  librealsense2-gl.so     librealsense2.so            librealsense-file.a      python3.6


If it can't find 'pipeline', it means you need to copy the missing __init__.py file.

sudo cp ./home/chicken/librealsense/wrappers/python/pyrealsense2/__init__.py ./usr/local/lib/python3.6/dist-packages/pyrealsense2/

Some extra things to do, 
sudo cp 99-realsense-libusb.rules  /etc/udev/rules.d/

Eventually, I am able to run the inference on the realsense camera, at an apparent 25 FPS, on the localhost, drawing to an OpenGL window.

I also developed a Dockerfile, for the purpose, which benefits from an updated Pytorch version, but various issues were encountered, making a bare-metal install far simpler, ultimately. Note that building jetson-inference, and Realsense SDK on the Nano require increasing your swap size, beyond the 2GB standard. Otherwise, the Jetson freezes once memory paging leads to swap death.

Anyway, since the objective is remote human viewing, (while providing depth information for the robot to use), the next step will require some more tests, to find a suitable option.

The main blocker is the power usage limitations on the Jetson Nano. I can’t seem to run Wifi and the camera at the same time. According to the tegrastats utility, the POM_5V_IN usage goes over the provided 4A, under basic usage. There are notes saying that 3A can be provided to 2 of the 5V GPIO pins, in order to get 6A total input. That might end up being necessary.

Initial investigation into serving RTSP resulted in inferior, and compressed results compared to a simple python server streaming image by image. The next investigation will be into WebRTC options, which are supposedly the current state of the art, for browser based video streaming. I tried aiortc, and momo, so far, both failed on the Nano.

I’ve decided to try on the Xavier NX, too, just to replicate the experiment, and see how things change. The Xavier has some higher wattage settings, and the wifi is internal, so worth a try. Also, upgraded to Jetpack 5.0.2, which was a gamble. Thought surely it would be better than upgrading to a 5.0.1 dev preview, but none of their official products support 5.0.2 yet, so there will likely be much pain involved. On the plus side, python 3.8 is standard, so some libraries are back on the menu.

On the Xavier, we’re getting 80 FPS, compared to 25 FPS on the Nano. Quite an upgrade. Also, able to run wifi and realsense at the same time.

Looks like a success. Getting multiple frames per second with about a second of lag over the network.

Categories
control dev envs robots Vision

RTSP

This was simple to set up, and is meant to be fast.

https://github.com/aler9/rtsp-simple-server#configuration

Unfortunately the results of using the OpenCV/GStreamer example code to transmit over the network using H264 compression, were even worse than the JPEG over HTTP attempt I’m trying to improve on. Much worse. That was surprising. It could be this wifi dongle though, which is very disappointing on the Jetson Nano. It appears as though the Jetson Nano tries to keep total wattage around 10W, however plugging in the Realsense camera and a wifi dongle is pulling way more than that (All 4A @ 5W supplied by the barrel jack). It may mean that wireless robotics with the Realsense is not practical, on the Jetson.

Required apt install gstreamer1.0-rtsp to be installed.

Back to drawing board for getting the RealSense colour and depth transmitting to a different viewing machine, on the network (while still providing distance data for server side computation).

Categories
dev Hardware Vision

Object Detection on RPi Zero?

A quick post, because I looked into this, and decided it wasn’t a viable option. We’re using RPi Zero W for the simplest robot, and I was thinking that with object detection, and ultrasound sensors for depth, one could approximate the far more complicated Realsense on Jetson option.

QEngineering managed to get 11FPS on classification, on the RPi.

But the simplest object detection, MobileNet SSD on Tensorflow 2 Lite, (supposedly faster than Tiny-YOLO3), appears to be narrowly possible, but it is limited to running inference on a picture, in about 6 or 7 seconds.

There is a Tensorflow Lite Micro, and some people have ported it for RPi Zero, (eg. tflite_micro_runtime) but I wasn’t able to install the pip wheel, and gave up.

This guy may have got it working, though it’s hard to tell. I followed the method for installing tensorflow 2 lite, and managed to corrupt my SD card, with “Structure needs cleaning” errors.

So maybe I try again some day, but it doesn’t look like a good option. The RPi 3 or 4 is a better bet. Some pages mentioned NNPack, which allows the use of multiple cores, for NNs. But since the RPi Zero has a single core, it’s likely that if I got it working, it would only achieve inference on a single image frame in 7 seconds, which isn’t going to cut it.

Categories
3D Research AI/ML deep envs institutes Vision

Panoptic Mapping

I just found this github from ETH Z. Not surprising that they have some of the most relevant datasets I’ve seen, pertaining to making proprioceptive autonomous systems. I came across their Autonomous Systems Labs dataset site.

One of the projects, panoptic mapping, is pretty much the panoptic segmentation from earlier research, combined with volumetric point clouds. “A flexible submap-based framework towards spatio-temporally consistent volumetric mapping and scene understanding.”

Categories
AI/ML Behaviour deep Vision

DeepLabCut

https://github.com/DeepLabCut

The NVIDIA Jetson instructions for a live application are here.

It looks like there’s some very recent papers and articles using this code for pose estimation.

Across-Species Pose Estimation in Poultry Based on Images Using Deep Learning

Study on Poultry Pose Estimation Based on Multi-Parts Detection:

“The automatic estimation of poultry posture can help to analyze the movement, behavior, and even health of poultry”

This paper mentions a few other uses of deep learning for chickens.

Categories
dev hardware_ Vision

Camera calibration

This is turning into one of the most oddly complicated sub-tasks of making a robot.

Turns out you need an intuition for epipolar geometry, to understand what it’s trying to do.

I’ve been battling this for weeks or months, on-and-off, and I must get it working soon. This week. But I know a week won’t be enough. This shit is hairy. Here’s eleven tips on StackOverflow. And here’s twelve tips from another guru.

The only articles that look like they got it working, have printed out these A2 sized chessboards. It’s ridiculous. Children in Africa don’t have A2 sized chessboard printouts!

Side notes: I’d like to mention that there also seem to be some interesting developments, in the direction of not needing perfect gigantic chessboards to calibrate your cameras, which in turn led down a rabbit hole, into a galaxy of projects packaged together with their own robotics software philosophies, and so on. Specifically, I found this OpenHSML project. They packaged their project using PID framework. This apparently standardises the build process, in some way. Clicking on the links to officially released frameworks using PID, leads to RPC, ethercatcpp, hardio, RKCL, and a whole world of sensor fusion topics, including a recent IEEE conference on MultiSensor Fusion and Integration (MFI2021.org).

So, back to chessboards. It’s clearly more of an art than a science.

Let’s, for the sake of argument, use the code at this OpenCV URL on the topic.

Let’s fire up the engines. I really want to close some R&D tabs, but we’re down to about 30. I’ll try reduce, to 20, while we go through the journey, together, dear internet reader, or future intelligence. Step One. Install OpenCV. I’m using the ISAAC ROS common docker with various modifications. (Eg. installing matplotlib, Jupyter Lab)

cd ~/your_ws/src/isaac_ros_common

./scripts/run_dev.sh ~/your_ws/

python3

print(cv2.version)
4.5.0

So this version of the docs should work, surely. Let’s start up Jupyter because it looks like it wants it.

python3 example_1.py
example_1.py:61: UserWarning: Matplotlib is currently using agg, which is a non-GUI backend, so cannot show the figure.
plt.show()

Firing up Jupyter, inside the docker and check out http://chicken:8888/lab

admin@chicken:/workspaces$ jupyter lab --ip 0.0.0.0 --port 8888 --allow-root &

Let's run the code... let's work out how to make the images bigger, ok, plt.rcParams["figure.figsize"] = (30,20)

It’s definitely doing something. I see the epilines, they look the same. That looks great. Ok next tutorial.

So, StereoBM. “Depth Map from Stereo Images

so, the top one is the disparity, viewed with StereoBM. Interesting. BM apparently means Block Matching. Interesting to see that the second run of the epiline code earlier, is changing behaviour. Different parallax estimation?

Changing the max disparities, and block size changes the image, kinda like playing with a photoshop filter. Anyway we’re here to calibrate. What next?

I think the winner for nicest website on the topic goes to Andreas Jakl here, professor.

“Here, we’ll use the traditional SIFT algorithm. Its patent expired in March 2020, and the algorithm got included in the main OpenCV implementation.”

Ok first things first, I’m tightening the screws into the camera holder plastic. We’re only doing this a few times.

Let’s run the capture program one more time for fun.

~/your_ws/jetson-stereo-depth/calib$ python3 capture-stereo.py

I edited the code to make the images 1280×720, which was one of the outputs of the camera. It’s apparently 16:9.

I took a bunch of left and right images. Ran it through the jetson-stereo-depth/calib/01_intrinsics_lens_dist.ipynb code, and the only chessboards it found were the smaller boards .

So let’s put back in the resizing. Yeah no. Ok. Got like 1 match out of 35.

Ok I’m giving up for now. But we’re printing, boys and girls. We’ll try A4 first, and see how it goes. Nope. A3. Professional print.

Nope. Nicely printed doesn’t matter.

Ran another 4 rounds of my own calibration code. Hundreds of pictures. You only need one match.

Nope. You can read more about messing around with failed calibration attempts in this section.

Asked NVIDIA – is it possible? Two monocular cameras into a stereo camera? – no reply. Two months later, NVIDIA replied. “problem will be time synchronizing their frames. You’ll also need to introduce the baseline into the camera_info of the right camera to make it work”

Time is running out. So I threw money at this problem.

Bought the Intel Realsense D455. At first glance, the programs that come with it are very cool, ‘realsense-viewer’. A bit freaky.

Installation is based off these instructions, because these instructions for the Jetson... don't quite work. 



Categories
bio chicken research CNNs evolution sexing Vision

Speckled eggs

Finally tried shining light through an egg, and discovered calcium deposits making things opaque. Hmm.

This led to finding out about all these abnormal eggs.

Categories
3D AI/ML arxiv CNNs control deep envs sim2real simulation Vision

UnrealROX

A sim2real with more or less photorealism, since Unreal Engine is so pretty. arxiv: https://arxiv.org/pdf/1810.06936.pdf

And the github writeup: https://sim2realai.github.io/UnrealROX/

Beautiful. Since we’re currently just using a 256×256 view port in pybullet, this is quite a bit more advanced than required though. Learning game engines can also take a while. It took me about a month to learn Unity3d, with intermediate C# experience. Unreal Engine uses C++, so it’s a bit less accessible to beginners.

Categories
AI/ML CNNs deep Locomotion simulation Vision

Simulation Vision 2

I’ve now got a UNet that can provide predictions for where an egg is, in simulation.

So I want to design a reward function related to the egg prediction mask.

I haven’t ‘plugged in’ the trained neural network though, because it will slow things down, and I can just as well make use of the built-in pybullet segmentation to get the simulation egg pixels. At some point though, the robot will have to exist in a world where egg pixels are not labelled as such, and the simulation trained vision will be a useful basis for training.

I think a good reward function might be, (to not fall over), and to maximize the number of 1s for the egg prediction mask. An intermediate award might be the centering of egg pixels.

The numpy way to count mask pixels could be

arr = np.array([1, 0, 0, 0, 0, 1, 1, 1, 1, 0])
np.count_nonzero(arr == 1)

I ended up using the following to count the pixels:

    seg = Image.fromarray(mask.astype('uint8'))
    self._num_ones = (np.array(seg) == 1).sum()

Hmm for centering, not sure yet.

I’m looking into how to run pybullet / gym on the cloud and get some of it rendering.

I’ve found a few leads. VNC is an obvious solution, but probably won’t be available on Chrome OS. Pybullet has a broken link, but I think it’s suggesting something like this colab, more or less, using ‘pyrender’. User matpalm has a minimal example of sending images to Google Dataflow. Those might be good if I can render video. There’s a Jupyter example with capturing images in pybullet. I’ll have to research a bit more. An RDP viewer would probably be easiest, if it’s possible.

Some interesting options on stackoverflow, too.

I set up the Ray Tune training again, on google cloud, and enabled the dashboard by opening some ports (8265, and 6006), and initialising ray with ray.init(dashboard_host=”0.0.0.0″)

I can see it improving the episode reward mean, but it’s taking a good while on the 4 CPU cloud machine. Cost is about $3.50/day on the CPU machine, and about $16/day on the GPU machine. Google is out of T4 GPUs at the moment.

I have it saving the occasional mp4 video using a Monitor wrapper that records every 10th episode.

def env_creator(env_config):
    env = RobotableEnv()
    env = gym.wrappers.Monitor(env, "./vid", video_callable=lambda episode_id: episode_id%10==0,force=True)
    return env

After one night of training, it went from about -30 reward to -5 reward. I’m just running it on the CPU machine while I iron out the issues.

I think curriculum training might also be a useful addition.