Categories
control dev envs hardware_ robots UI Vision

Slamtec RPLidar

I got the RPLidar A1M8-R6 with firmware 1.29, and at first, it was just plastic spinning around, and none of the libraries worked.

But I got it working on Windows, as a sanity check, so it wasn’t broken. So I started again on getting it working on the Raspberry Pi Zero W.

Tried the Adafruit python libs, but v1.29 had some insurmountable issue, and I couldn’t downgrade to v1.27.

So needed to compile the Slamtec SDK.

A helpful post pointed out how to fix the compile error and I was able to compile.

I soldered on some extra wires to the motor + and -, to power the motor separately.

Wasn’t getting any luck, but it turned out to be the MicroUSB cable (The OTG cable was ok). After swapping it out, I was able to run the simple_grabber app and confirm that data was coming out.

pi@raspberrypi:~/rplidar_sdk/output/Linux/Release $ ./simple_grabber --channel --serial /dev/ttyUSB0 115200
theta: 59.23 Dist: 00160.00
theta: 59.50 Dist: 00161.00
theta: 59.77 Dist: 00162.00
theta: 59.98 Dist: 00164.00
theta: 60.29 Dist: 00165.00
theta: 61.11 Dist: 00168.00

I debugged the Adafruit v1.29 issue too. So now I’m able to get the data in python, which will probably be nicer to work with, as I haven’t done proper C++ in like 20 years. But this Slamtec code would be the cleanest example to work with.

So I added in some C socket code and recompiled, so now the demo app takes a TCP connection and starts dumping data.

./ultra_simple --channel --serial /dev/ttyUSB0 115200

It was actually A LOT faster than the python libraries. But I started getting ECONNREFUSED errors, which I thought might be because the Pi Zero W only has a single CPU, and the Python WSGI worker engine was eventlet, which only handles 1 worker, for flask-socketio, and running a socket server, client, and socket-io, on a single CPU, was creating some sort of resource contention. But I couldn’t solve it.

I found a C++ python-wrapped project but it was compiled for 64 bit, and the software, SWIG, which I needed to recompile for 32 bit, seemed a bit complicated.

So, back to Python.

Actually, back to javascript, to get some visuals in a browser. The Adafruit example is for pygame, but we’re over a network, so that won’t work. Rendering Matplotlib graphs is going to be too slow. Need to stream data, and render it on the front end.

Detour #1: NPM

Ok… so, need to install Node.js to install this one, which for Raspberry Pi Zero W, is ARM6.

This is the most recent ARM6 nodejs tarball:

wget https://nodejs.org/dist/latest-v11.x/node-v11.15.0-linux-armv6l.tar.gz

tar xzvf node-v11.15.0-linux-armv6l.tar.gz
cd node-v11.15.0-linux-armv6l
sudo cp -R * /usr/local/
sudo ldconfig
npm install --global yarn
sudo npm install --global yarn

npm install rplidar

npm ERR! serialport@4.0.1 install: `node-pre-gyp install --fallback-to-build`
 
Ok...  never mind javascript for now.

Detour #2: Dash/Plotly

Let’s try this python code. https://github.com/Hyun-je/pyrplidar

Ok well it looks like it works maybe, but where is s/he getting that nice plot from? Not in the code. I want the plot.

So, theta and distance are just polar coordinates. So I need to plot polar coordinates.

PolarToCartesian.

Convert a polar coordinate (r,θ) to cartesian (x,y): x = r cos(θ), y = r sin(θ)

Ok that is easy, right? So here’s a javascript library with a polar coordinate plotter

So, plan is, set up a flask route, read RPLidar data, publish to a front end, which plots it in javascript

Ok after some googling, Dash / Plotly looks like a decent option.

Found this code. Cool project! And though this guy used a different Lidar, it’s pretty much what I’m trying to do, and he’s using plotly.

pip3 install pandas
pip3 install dash

k let's try...
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 48 from C header, got 40 from PyObject

ok
pip3 install --upgrade numpy     
(if your numpy version is < 1.20.0)

ok now  bad marshal data (unknown type code)
sheesh, what garbage.  
Posting issue to their github and going back to the plan.

Reply from Plotly devs: pip3 won’t work, will need to try conda install, for ARM6

Ok let’s see if we can install plotly again….

Going to try miniconda – they have a arm6 file here

Damn. 2014. Python 2. Nope. Ok Plotly is not an option for RPi Zero W. I could swap to another RPi, but I don’t think the 1A output of the power bank can handle it, plus camera, plus lidar motor, and laser. (I am using the 2.1A output for the servos).

Solution #1: D3.js

Ok, Just noting this link, as it looks useful for the lidar robot, later.

So, let’s install socket io and websockets

pip3 install flask_socketio
pip3 install simple-websocket
pip3 install flask-executor

(looking at this link) for flask and socket-io, and this link for d3 polar chart

app isn’t starting though, since adding socket-io So, hmm. Ok, this issue. Right, needs 0.0.0.0.

socketio.run(app, debug=True, host='0.0.0.0')

Back to it…

K. Let’s carry on with Flask/d3.js though.

I think if we’re doing threading, I need to use a WSGI server.

pip install waitress

ok that won’t work with flask-socketio. Needs gevent or eventlet.

eventlet is the best performant option, with support for long-polling and WebSocket transports.”

apparently needs redis for message queueing…

pip install eventlet
pip install redis

Ok, and we need gunicorn, because eventlet is just for workers...

pip3 install gunicorn

gunicorn --worker-class eventlet -w 1 module:app

k, that throws an error.
I need to downgrade eventlet, or do some complicated thing.
pip install eventlet==0.30.2

gunicorn --bind 0.0.0.0 --worker-class eventlet -w 1 kmp8servo:app
(my service is called kmp8servo.py)


ok so do i need redis?
sudo apt-get install redis
ok it's already running now, 
at /usr/bin/redis-server 127.0.0.1:6379
no, i don't really need redis.  Could use sqlite, too. But let's use it anyway.

Ok amazing, gunicorn works.  It's running on port 8000

Ok, after some work,  socket-io is also working.

Received #0: Connected
Received #1: I'm connected!
Received #2: Server generated event
Received #3: Server generated event
Received #4: Server generated event
Received #5: Server generated event
Received #6: Server generated event
Received #7: Server generated event

So, I’m going to go with d3.js instead of P5js, just cause it’s got a zillion more users, and there’s plenty of polar coordinate code to look at, too.

Got it drawing the polar background… but I gotta change the scale a bit. The code uses a linear scale from 0 to 1, so I need to get my distances down to something between 0 and 1. Also need radians, instead of the degrees that the lidar is putting out.

ok finally. what an ordeal.

But now we still need to get python lidar code working though, or switch back to the C socket code I got working.

Ok, well, so I added D3 update code with transitions, and the javascript looks great.

But the C Slamtec SDK, and the Python RP Lidar wrappers are a source of pain.

I had the C sockets working briefly, but it stopped working, seemingly while I added more Python code between each socket read. I got frustrated and gave up.

The Adafruit library, with the fixes I made, seem to work now, but it’s in a very precarious state, where looking at it funny causes a bad descriptor field, or checksum error.

But I managed to get the brain turning on, with the lidar. I’m using Redis to track the variables, using the memory.py code from this K9 repo. Thanks.

I will come back to trying to fix the remaining python library issues, but for now, the robot is running, so, on to the next.

Categories
AI/ML CNNs dev Locomotion OpenCV robots UI Vision

Realsense Depth and TensorRT object detection

A seemingly straightforward idea for robot control involves using depth, and object detection, to form a rough model of the environment.

After failed attempts to create our own stereo camera using two monocular cameras, we eventually decided to buy a commercial product instead, The Intel depth camera, D455.

After a first round of running a COCO trained MobileSSDv2 object detection network in Tensorflow 2 Lite, on the colour images obtained from the realsense camera, on the Jetson Nano, the results were just barely acceptable (~2 FPS) for a localhost stream, and totally unacceptable (~0.25 FPS) served as JPEG over HTTP, to a browser on the network.

Looking at the options, the only feasible solution was to redo the network using TensorRT, the NVIDIA-specific, quantized (16 bit on the Nano, 8 bit on the NX/AGX) neural network framework. The other solution involved investigating options other than simple JPEG compression over HTTP, such as RTSP and WebRTC.

The difficult part was setting up the environment. We used the NVIDIA detectnet code, adapted to take the realsense camera images as input, and to display the distance to the objects. An outdated example was found at CAVEDU robotics blog/github. Fixed up below.

#!/usr/bin/python3



import jetson_inference
import jetson_utils
import argparse
import sys
import os
import cv2
import re
import numpy as np
import io
import time
import json
import random
import pyrealsense2 as rs
from jetson_inference import detectNet
from jetson_utils import videoSource, videoOutput, logUsage, cudaFromNumpy, cudaAllocMapped, cudaConvertColor

parser = argparse.ArgumentParser(description="Locate objects in a live camera stream using an object detection DNN.",
formatter_class=argparse.RawTextHelpFormatter, epilog=jetson_utils.logUsage())
parser.add_argument("--network", type=str, default="ssd-mobilenet-v2",
help="pre-trained model to load (see below for options)")
parser.add_argument("--threshold", type=float, default=0.5,
help="minimum detection threshold to use")
parser.add_argument("--width", type=int, default=640,
help="set width for image")
parser.add_argument("--height", type=int, default=480,
help="set height for image")
opt = parser.parse_known_args()[0]

# load the object detection network
net = detectNet(opt.network, sys.argv, opt.threshold)

# Configure depth and color streams
pipeline = rs.pipeline()
config = rs.config()
config.enable_stream(rs.stream.depth, opt.width, opt.height, rs.format.z16, 30)
config.enable_stream(rs.stream.color, opt.width, opt.height, rs.format.bgr8, 30)
# Start streaming
pipeline.start(config)


press_key = 0
while (press_key==0):
	# Wait for a coherent pair of frames: depth and color
	frames = pipeline.wait_for_frames()
	depth_frame = frames.get_depth_frame()
	color_frame = frames.get_color_frame()
	if not depth_frame or not color_frame:
		continue
	# Convert images to numpy arrays
	depth_image = np.asanyarray(depth_frame.get_data())
	show_img = np.asanyarray(color_frame.get_data())
	
	# convert to CUDA (cv2 images are numpy arrays, in BGR format)
	bgr_img = cudaFromNumpy(show_img, isBGR=True)
	# convert from BGR -> RGB
	img = cudaAllocMapped(width=bgr_img.width,height=bgr_img.height,format='rgb8')
	cudaConvertColor(bgr_img, img)

	# detect objects in the image (with overlay)
	detections = net.Detect(img)

	for num in range(len(detections)) :
		score = round(detections[num].Confidence,2)
		box_top=int(detections[num].Top)
		box_left=int(detections[num].Left)
		box_bottom=int(detections[num].Bottom)
		box_right=int(detections[num].Right)
		box_center=detections[num].Center
		label_name = net.GetClassDesc(detections[num].ClassID)

		point_distance=0.0
		for i in range (10):
			point_distance = point_distance + depth_frame.get_distance(int(box_center[0]),int(box_center[1]))

		point_distance = np.round(point_distance / 10, 3)
		distance_text = str(point_distance) + 'm'
		cv2.rectangle(show_img,(box_left,box_top),(box_right,box_bottom),(255,0,0),2)
		cv2.line(show_img,
			(int(box_center[0])-10, int(box_center[1])),
			(int(box_center[0]+10), int(box_center[1])),
			(0, 255, 255), 3)
		cv2.line(show_img,
			(int(box_center[0]), int(box_center[1]-10)),
			(int(box_center[0]), int(box_center[1]+10)),
			(0, 255, 255), 3)
		cv2.putText(show_img,
			label_name + ' ' + distance_text,
			(box_left+5,box_top+20),cv2.FONT_HERSHEY_SIMPLEX,0.4,
			(0,255,255),1,cv2.LINE_AA)

	cv2.putText(show_img,
		"{:.0f} FPS".format(net.GetNetworkFPS()),
		(int(opt.width*0.8), int(opt.height*0.1)),
		cv2.FONT_HERSHEY_SIMPLEX,1,
		(0,255,255),2,cv2.LINE_AA)


	display = cv2.resize(show_img,(int(opt.width*1.5),int(opt.height*1.5)))
	cv2.imshow('Detecting...',display)
	keyValue=cv2.waitKey(1)
	if keyValue & 0xFF == ord('q'):
		press_key=1


cv2.destroyAllWindows()
pipeline.stop()

Assuming you have a good cmake version and cuda is available (if nvcc doesn’t work, you need to configure linker paths, check this link)… (if you have a cmake version around ~ 3.22-3.24 or so, you need an older one)… prerequisite sudo apt-get install libssl-dev also required.

The hard part was actually setting up the Realsense python bindings.

Clone the repo…

git clone https://github.com/IntelRealSense/librealsense.git

The trick being, to request the python bindings, and cuda, during the cmake phase. Note that often, none of this works. Some tips include…

sudo apt-get install xorg-dev libglu1-mesa-dev

and changing PYTHON to Python

mkdir build
cd build
cmake ../ -DBUILD_PYTHON_BINDINGS:bool=true -DPYTHON_EXECUTABLE=/usr/bin/python3 -DCMAKE_BUILD_TYPE=release -DBUILD_EXAMPLES=true -DBUILD_GRAPHICAL_EXAMPLES=true -DBUILD_WITH_CUDA:bool=true

The above worked on Jetpack 4.6.1, while the below worked on Jetpack 5.0.2

cmake ../ -DBUILD_PYTHON_BINDINGS:bool=true -DPython_EXECUTABLE=/usr/bin/python3.8 -DCMAKE_BUILD_TYPE=release -DBUILD_EXAMPLES=true -DBUILD_GRAPHICAL_EXAMPLES=true -DBUILD_WITH_CUDA:bool=true -DPYTHON_INCLUDE_DIRS=/usr/include/python3.8 -DPython_LIBRARIES=/usr/lib/aarch64-linux-gnu/libpython3.8.so

(and sudo make install)
Update the python path

export PYTHONPATH=$PYTHONPATH:/usr/local/lib
(or a specific python if you have more than one)
if installed in /usr/lib, change accordingly

Check that the folder is in the correct location (it isn't, after following official instructions).

./usr/local/lib/python3.6/dist-packages/pyrealsense2/

Check that the shared object files (.so) are in the right place: 

chicken@chicken:/usr/local/lib$ ls
cmake       libjetson-inference.so  librealsense2-gl.so.2.50    librealsense2.so.2.50    pkgconfig
libfw.a     libjetson-utils.so      librealsense2-gl.so.2.50.0  librealsense2.so.2.50.0  python2.7
libglfw3.a  librealsense2-gl.so     librealsense2.so            librealsense-file.a      python3.6


If it can't find 'pipeline', it means you need to copy the missing __init__.py file.

sudo cp ./home/chicken/librealsense/wrappers/python/pyrealsense2/__init__.py ./usr/local/lib/python3.6/dist-packages/pyrealsense2/

Some extra things to do, 
sudo cp 99-realsense-libusb.rules  /etc/udev/rules.d/

Eventually, I am able to run the inference on the realsense camera, at an apparent 25 FPS, on the localhost, drawing to an OpenGL window.

I also developed a Dockerfile, for the purpose, which benefits from an updated Pytorch version, but various issues were encountered, making a bare-metal install far simpler, ultimately. Note that building jetson-inference, and Realsense SDK on the Nano require increasing your swap size, beyond the 2GB standard. Otherwise, the Jetson freezes once memory paging leads to swap death.

Anyway, since the objective is remote human viewing, (while providing depth information for the robot to use), the next step will require some more tests, to find a suitable option.

The main blocker is the power usage limitations on the Jetson Nano. I can’t seem to run Wifi and the camera at the same time. According to the tegrastats utility, the POM_5V_IN usage goes over the provided 4A, under basic usage. There are notes saying that 3A can be provided to 2 of the 5V GPIO pins, in order to get 6A total input. That might end up being necessary.

Initial investigation into serving RTSP resulted in inferior, and compressed results compared to a simple python server streaming image by image. The next investigation will be into WebRTC options, which are supposedly the current state of the art, for browser based video streaming. I tried aiortc, and momo, so far, both failed on the Nano.

I’ve decided to try on the Xavier NX, too, just to replicate the experiment, and see how things change. The Xavier has some higher wattage settings, and the wifi is internal, so worth a try. Also, upgraded to Jetpack 5.0.2, which was a gamble. Thought surely it would be better than upgrading to a 5.0.1 dev preview, but none of their official products support 5.0.2 yet, so there will likely be much pain involved. On the plus side, python 3.8 is standard, so some libraries are back on the menu.

On the Xavier, we’re getting 80 FPS, compared to 25 FPS on the Nano. Quite an upgrade. Also, able to run wifi and realsense at the same time.

Looks like a success. Getting multiple frames per second with about a second of lag over the network.

Categories
AI/ML Behaviour bio chicken_research control deep dev ears evolution highly_speculative neuro UI

Hierarchical Temporal Memory

Here I’m continuing with the task of unsupervised detection of audio anomalies, hopefully for the purpose of detecting chicken stress vocalisations.

After much fussing around with the old Numenta NuPic codebase, I’m porting the older nupic.audio and nupic.critic code, over to the more recent htm.core.

These are the main parts:

  • Sparse Distributed Representation (SDR)
  • Encoders
  • Spatial Pooler (SP)
  • Temporal Memory (TM)

I’ve come across a very intricate implementation and documentation, about understanding the important parts in the HTM model, way deep, like how did I get here? I will try implement the ‘critic’ code, first. Or rather, I’ll try port it from nupic to htm. After further investigation, there’s a few options, and I’m going to try edit the hotgym example, and try shove wav files frequency band scalars through it instead of power consumption data. I’m simplifying the investigation. But I need to make some progress.

I’m using this docker to get in, mapping my code and wav file folder in:

docker run -d -p 8888:8888 --name jupyter -v /media/chrx/0FEC49A4317DA4DA/sounds/:/home/jovyan/work 3rdman/htm.core-jupyter:latest



So I've got some code working that writes to 'nupic format' (.csv) and code that reads the amplitudes from the csv file, and then runs it through htm.core. 

So it takes a while, and it's just for 1 band (of 10 bands). I see it also uses the first 1/4 of so of the time to know what it's dealing with.  Probably need to run it through twice to get predictive results in the first 1/4. 

Ok no, after a few weeks, I've come back to this point, and realise that the top graph is the important one.  Prediction is what's important.  The bottom graphs are the anomaly scores, used by the prediction.  
Frequency Band 0

The idea in nupic.critic, was to threshold changes in X bands. Let’s see the other graphs…

Frequency band 0: 0-480Hz ?
Frequency band 2: 960-1440Hz ?
Frequency band 3: 1440-1920Hz ?
Frequency band 4: 1920-2400Hz ?
Frequency band 5: 2400-2880Hz ?
Frequency band 6: 2880-3360Hz ?

Ok Frequency bands 7, 8, 9 were all zero amplitude. So that’s the highest the frequencies went. Just gotta check what those frequencies are, again…

Opening 307.wav
Sample width (bytes): 2
Frame rate (sampling frequency): 48000
Number of frames: 20771840
Signal length: 20771840
Seconds: 432
Dimensions of periodogram: 4801 x 2163

Ok with 10 buckets, 4801 would divide into 
Frequency band 0: 0-480Hz
Frequency band 1: 480-960Hz
Frequency band 2: 960-1440Hz
Frequency band 3: 1440-1920Hz
Frequency band 4: 1920-2400Hz
Frequency band 5: 2400-2880Hz
Frequency band 6: 2880-3360Hz

Ok what else. We could try segment the audio by band, so we can narrow in on the relevant frequency range, and then maybe just focus on that smaller range, again, in higher detail.

Learning features with some labeled data, is probably the correct way to do chicken stress vocalisation detections.

Unsupervised anomaly detection might be totally off, in terms of what an anomaly is. It is probably best, to zoom in on the relevant bands and to demonstrate a minimal example of what a stressed chicken sounds like, vs a chilled chicken, and compare the spectrograms to see if there’s a tell-tale visualisable feature.

A score from 1 to 5 for example, is going to be anomalous in arbitrary ways, without labelled data. Maybe the chickens are usually stressed, and the anomalies are when they are unstressed, for example.

A change in timing in music might be defined, in some way. like 4 out of 7 bands exhibiting anomalous amplitudes. But that probably won’t help for this. It’s probably just going to come down to a very narrow band of interest. Possibly pointing it out on a spectrogram that’s zoomed in on the feature, and then feeding the htm with an encoding of that narrow band of relevant data.


I’ll continue here, with some notes on filtering. After much fuss, the sox app (apt-get install sox) does it, sort of. Still working on python version.

                                                                              $ sox 307_0_50.wav filtered_50_0.wav sinc -n 32767 0-480
$ sox 307_0_50.wav filtered_50_1.wav sinc -n 32767 480-960
$ sox 307_0_50.wav filtered_50_2.wav sinc -n 32767 960-1440
$ sox 307_0_50.wav filtered_50_3.wav sinc -n 32767 1440-1920
$ sox 307_0_50.wav filtered_50_4.wav sinc -n 32767 1920-2400
$ sox 307_0_50.wav filtered_50_5.wav sinc -n 32767 2400-2880
$ sox 307_0_50.wav filtered_50_6.wav sinc -n 32767 2880-3360


So, sox does seem to be working.  The mel spectrogram is logarithmic, which is why it looks like this.

Visually, it looks like I'm interested in 2048 to 4096 Hz.  That's where I can see the chirps.

Hmm. So I think the spectrogram is confusing everything.

So where does 4800 come from? 48 kHz. 48,000 Hz (48 kHz) is the sample rate “used for DVDs“.

Ah. Right. The spectrogram values represent buckets of 5 samples each, and the full range is to 24000…?

Sample width (bytes): 2
0.     5.    10.    15.    20.    25.    30.    35.    40.    45.    50.    55.    60.    65.    70.    75.    80.    85.    90.    95.    100.   105.   110.   115.   120.   125.   130.   135.   140.   145.
...
 23950. 23955. 23960. 23965. 23970. 23975. 23980. 23985. 23990. 23995. 24000.]

ok. So 2 x 24000. Maybe 2 channels? Anyway, full range is to 48000Hz. In that case, are the bands actually…

Frequency band 0: 0-4800Hz
Frequency band 1: 4800-9600Hz
Frequency band 2: 9600-14400Hz
Frequency band 3: 14400-19200Hz
Frequency band 4: 19200-24000Hz
Frequency band 5: 24000-28800Hz
Frequency band 6: 28800-33600Hz

Ok so no, it’s half the above because of the sample width of 2.

Frequency band 0: 0-2400Hz
Frequency band 1: 2400-4800Hz
Frequency band 2: 4800-7200Hz
Frequency band 3: 7200-9600Hz
Frequency band 4: 9600-12000Hz
Frequency band 5: 12000-14400Hz
Frequency band 6: 14400-16800Hz

So why is the spectrogram maxing at 8192Hz? Must be spectrogram sampling related.

ol_hann_win
From Berkeley document

So the original signal is 0 to 24000Hz, and the spectrogram must be 8192Hz because… the spectrogram is made some way. I’ll try get back to this when I understand it.

sox 307_0_50.wav filtered_50_0.wav sinc -n 32767 0-2400
sox 307_0_50.wav filtered_50_1.wav sinc -n 32767 2400-4800
sox 307_0_50.wav filtered_50_2.wav sinc -n 32767 4800-7200
sox 307_0_50.wav filtered_50_3.wav sinc -n 32767 7200-9600
sox 307_0_50.wav filtered_50_4.wav sinc -n 32767 9600-12000
sox 307_0_50.wav filtered_50_5.wav sinc -n 32767 12000-14400
sox 307_0_50.wav filtered_50_6.wav sinc -n 32767 14400-16800

Ok I get it now.

Ok i don’t entirely understand the last two. But basically the mel spectrogram is logarithmic, so those high frequencies really don’t get much love on the mel spectrogram graph. Buggy maybe.

But I can estimate now the chirp frequencies…

sox 307_0_50.wav filtered_bird.wav sinc -n 32767 1800-5200

Beautiful. So, now to ‘extract the features’…

So, the nupic.critic code with 1 bucket managed to get something resembling the spectrogram. Ignore the blue.

But it looks like maybe, we can even just threshold and count peaks. That might be it.

sox 307.wav filtered_307.wav sinc -n 32767 1800-5200
sox 3072.wav filtered_3072.wav sinc -n 32767 1800-5200
sox 237.wav filtered_237.wav sinc -n 32767 1800-5200
sox 98.wav filtered_98.wav sinc -n 32767 1800-5200

Let’s do the big files…

Ok looks good enough.

So now I’m plotting the ‘chirp density’ (basically volume).

’98.wav’
‘237.wav’
‘307.wav’
‘3072.wav’

In this scheme, we just proxy chirp volume density as a variable representing stress.  We don’t know if it is a true proxy.
As you can see, some recordings have more variation than others.  

Some heuristic could be decided upon, for rating the stress from 1 to 5.  The heuristic depends on how the program would be used.  For example, if it were streaming audio, for an alert system, it might alert upon some duration of time spent above one standard deviation from the rolling mean. I’m not sure how the program would be used though.

If the goal were to differentiate stressed and not stressed vocalisations, that would require labelled audio data.   

(Also, basically didn’t end up using HTM, lol)

Categories
highly_speculative meta UI

“Mechanical Turk”ing

Audience participation could adds data points and labels, for classification training or similar. But what?

Classification needs a user interface. I saw one here:

Informatics 06 00038 g004 550
Collecting Labels for Rare Anomalies via Direct Human Feedback—An Industrial Application Study

“What type of anomaly is this?”

Informatics 06 00038 g001 550
Reporting an Anomaly

Here is Miranda demonstrating a similar skill

Categories
3D 3D Research AI/ML arxiv CNNs control envs Locomotion simulation UI Vision

SLAM part 2 (Overview, RTSP, Calibration)

Continuing from our early notes on SLAM algorithms (Simultaneous Localisation and Mapping), and the similar but not as map-making, DSO algorithm, I came across a good project (“From cups to consciousness“) and article that reminded me that mapping the environment or at least having some sense of depth, will be pretty crucial.

At the moment I’ve just got to the point of thinking to train a CNN on simulation data, and so there should also be some positioning of the robot as a model in it’s own virtual world. So it’s probably best to reexamine what’s already out there. Visual odometry. Optical Flow.

I found a good paper summarizing 2019 options. The author’s github has some interesting scripts that might be useful. It reminds me that I should probably be using ROS and gazebo, to some extent. The conclusion was roughly that Google Cartographer or GMapping (Open SLAM) are generally beating some other ones, Karto, Hector. Seems like SLAM code is all a few years old. Google Cartographer had some support for ‘lifelong mapping‘, which sounded interesting. The robot goes around updating its map, a bit. It reminds me I saw ‘PonderNet‘ today, fresh from DeepMind, which from a quick look is, more or less, about like scaling your workload down to your input size.

Anyway, we are mostly interested in Monocular SLAM. So none of this applies, probably. I’m mostly interested at the moment, in using some prefab scenes like the AI2Thor environment in the Cups-RL example, and making some sort of SLAM in simulation.

Also interesting is RATSLAM and the recent update: LatentSLAM – The authors of this site, The Smart Robot, got my attention because of the CCNs. Cortical column networks.

LatentSLAM: https://arxiv.org/pdf/2105.03265.pdf

“A common shortcoming of RatSLAM is its sensitivity
to perceptual aliasing, in part due to the reliance on
an engineered visual processing pipeline. We aim to reduce
the effects of perceptual aliasing by replacing the perception
module by a learned dynamics model. We create a generative
model that is able to encode sensory observations into a
latent code that can be used as a replacement to the visual
input of the RatSLAM system”

Interesting, “The robot performed 1,143 delivery tasks to 11 different locations with only one delivery failure (from which it recovered), traveled a total distance of more than 40 km over 37 hours of active operation, and recharged autonomously a total of 23 times.

I think DSO might be a good option, or the closed loop, LDSO, look like the most straight-forward, maybe.

After a weekend away with a computer vision professional, I found out about COLMAP, a structure from movement suite.

I saw a few more recent projects too, e.g. NeuralRecon, and

ooh, here’s a recent facebook one that sounds like it might work!

Consistent Depth … eh, their google colab is totally broken.

Anyhow, LDSO. Let’s try it.

In file included from /dmc/LDSO/include/internal/OptimizationBackend/AccumulatedTopHessian.h:10:0,
from /dmc/LDSO/include/internal/OptimizationBackend/EnergyFunctional.h:9,
from /dmc/LDSO/include/frontend/FeatureMatcher.h:10,
from /dmc/LDSO/include/frontend/FullSystem.h:18,
from /dmc/LDSO/src/Map.cc:4:
/dmc/LDSO/include/internal/OptimizationBackend/MatrixAccumulators.h:8:10: fatal error: SSE2NEON.h: No such file or directory
#include "SSE2NEON.h"
^~~~
compilation terminated.
src/CMakeFiles/ldso.dir/build.make:182: recipe for target 'src/CMakeFiles/ldso.dir/Map.cc.o' failed
make[2]: *** [src/CMakeFiles/ldso.dir/Map.cc.o] Error 1
make[2]: *** Waiting for unfinished jobs….
CMakeFiles/Makefile2:85: recipe for target 'src/CMakeFiles/ldso.dir/all' failed
make[1]: *** [src/CMakeFiles/ldso.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

Ok maybe not.

There’s a paper here reviewing ORBSLAM3 and LDSO, and they encounter lots of issues. But it’s a good paper for an overview of how the algorithms work. We want a point cloud so we can find the closest points, and not walk into them.

Calibration is an issue, rolling shutter cameras are an issue, IMU data can’t be synced meaningfully, it’s a bit of a mess, really.

Also, reports that ORB-SLAM2 was only getting 5 fps on a raspberry pi, I got smart, and looked for something specifically for the jetson. I found a depth CNN for monocular vision on the forum, amazing.

Then this is a COLMAP structure-from-motion option, and some more depth stuff… and more making it high res

Ok so after much fussing about, I found just what we need. I had an old copy of jetson-containers, and the slam code was added just 6 months ago. I might want to try the noetic one (ROS2) instead of ROS, good old ROS.

git clone https://github.com/dusty-nv/jetson-containers.git
cd jetson-containers

chicken@jetson:~/jetson-containers$ ./scripts/docker_build_ros.sh --distro melodic --with-slam


Successfully built 2eb4d9c158b0
Successfully tagged ros:melodic-ros-base-l4t-r32.5.0


chicken@jetson:~/jetson-containers$ ./scripts/docker_test_ros.sh melodic
reading L4T version from /etc/nv_tegra_release
L4T BSP Version:  L4T R32.5.0
l4t-base image:  nvcr.io/nvidia/l4t-base:r32.5.0
testing container ros:melodic-ros-base-l4t-r32.5.0 => ros_version
xhost:  unable to open display ""
xauth:  file /tmp/.docker.xauth does not exist
sourcing   /opt/ros/melodic/setup.bash
ROS_ROOT   /opt/ros/melodic/share/ros
ROS_DISTRO melodic
getting ROS version -
melodic
done testing container ros:melodic-ros-base-l4t-r32.5.0 => ros_version



Well other than the X display, looking good.

Maybe I should just plug in a monitor. Ideally I wouldn’t have to, though. I used GStreamer the other time. Maybe we do that again.

This looks good too… https://github.com/dusty-nv/ros_deep_learning but let’s stay focused. I’m also thinking maybe we upgrade early, to noetic. Ugh it looks like a whole new bunch of build tools and things to relearn. I’m sure it’s amazing. Let’s do ROS1, for now.

Let’s try build that FCNN one again.

CMake Error at tx2_fcnn_node/Thirdparty/fcrn-inference/CMakeLists.txt:121 (find_package):
  By not providing "FindOpenCV.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "OpenCV", but
  CMake did not find one.

  Could not find a package configuration file provided by "OpenCV" (requested
  version 3.0.0) with any of the following names:

    OpenCVConfig.cmake
    opencv-config.cmake

  Add the installation prefix of "OpenCV" to CMAKE_PREFIX_PATH or set
  "OpenCV_DIR" to a directory containing one of the above files.  If "OpenCV"
  provides a separate development package or SDK, be sure it has been
  installed.


-- Configuring incomplete, errors occurred!

Ok hold on…

Builds additional container with VSLAM packages,
including ORBSLAM2, RTABMAP, ZED, and Realsense.
This only applies to foxy and galactic and implies 
--with-pytorch as these containers use PyTorch."

Ok so not melodic then. ROS2 it is…

./scripts/docker_build_ros.sh --distro foxy --with-slam

Ok that hangs when it starts building the slam bits. Luckily, someone’s raised the bug, and though it’s not fixed, Dusty does have a docker already compiled.

sudo docker pull dustynv/ros:foxy-slam-l4t-r32.6.1

I started it up with

docker run -it --runtime nvidia --rm --network host --privileged --device /dev/video0 -v /home/chicken/:/dmc dustynv/ros:foxy-slam-l4t-r32.6.1

So, after some digging, I think we can solve the X problem (i.e. where are we going to see this alleged SLAMming occur?) with an RTSP server. Previously I used GStreamer to send RTP over UDP. But this makes more sense, to run a server on the Jetson. There’s a plugin for GStreamer, so I’m trying to get the ‘dev’ version, so I can compile the test-launch.c program.

apt-get install libgstrtspserver-1.0-dev
Reading package lists... Done
Building dependency tree
Reading state information... Done
libgstrtspserver-1.0-dev is already the newest version (1.14.5-0ubuntu1~18.04.1).

ok... git clone https://github.com/GStreamer/gst-rtsp-server.git

root@jetson:/opt/gst-rtsp-server/examples# gcc test-launch.c -o test-launch $(pkg-config --cflags --libs gstreamer-1.0 gstreamer-rtsp-server-1.0)
test-launch.c: In function ‘main’:
test-launch.c:77:3: warning: implicit declaration of function ‘gst_rtsp_media_factory_set_enable_rtcp’; did you mean ‘gst_rtsp_media_factory_set_latency’? [-Wimplicit-function-declaration]
   gst_rtsp_media_factory_set_enable_rtcp (factory, !disable_rtcp);
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   gst_rtsp_media_factory_set_latency
/tmp/ccC1QgPA.o: In function `main':
test-launch.c:(.text+0x154): undefined reference to `gst_rtsp_media_factory_set_enable_rtcp'
collect2: error: ld returned 1 exit status




gst_rtsp_media_factory_set_enable_rtcp

Ok wait let’s reinstall gstreamer.

apt-get install libgstreamer1.0-dev libgstreamer-plugins-base1.0-dev libgstreamer-plugins-bad1.0-dev gstreamer1.0-plugins-base gstreamer1.0-plugins-good gstreamer1.0-plugins-bad gstreamer1.0-plugins-ugly gstreamer1.0-libav gstreamer1.0-doc gstreamer1.0-tools gstreamer1.0-x gstreamer1.0-alsa gstreamer1.0-gl gstreamer1.0-gtk3 gstreamer1.0-qt5 gstreamer1.0-pulseaudio


error...

Unpacking libgstreamer-plugins-bad1.0-dev:arm64 (1.14.5-0ubuntu1~18.04.1) ...
Errors were encountered while processing:
 /tmp/apt-dpkg-install-Ec7eDq/62-libopencv-dev_3.2.0+dfsg-4ubuntu0.1_arm64.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)

Ok then leave out that one... 

apt --fix-broken install
and that fails on
Errors were encountered while processing:
 /var/cache/apt/archives/libopencv-dev_3.2.0+dfsg-4ubuntu0.1_arm64.deb
 


It’s like a sign of being a good programmer, to solve this stuff. But damn. Every time. Suggestions continue, in the forums of those who came before. Let’s reload the docker.

root@jetson:/opt/gst-rtsp-server/examples# pkg-config --cflags --libs gstreamer-1.0

-pthread -I/usr/include/gstreamer-1.0 -I/usr/include/glib-2.0 -I/usr/lib/aarch64-linux-gnu/glib-2.0/include -lgstreamer-1.0 -lgobject-2.0 -lglib-2.0

root@jetson:/opt/gst-rtsp-server/examples# pkg-config --cflags --libs gstreamer-rtsp-server-1.0
-pthread -I/usr/include/gstreamer-1.0 -I/usr/include/glib-2.0 -I/usr/lib/aarch64-linux-gnu/glib-2.0/include -lgstrtspserver-1.0 -lgstbase-1.0 -lgstreamer-1.0 -lgobject-2.0 -lglib-2.0
 

Ok I took a break and got lucky. The test-launch.c code is different from what the admin had.

Let’s diff it and see what changed…

#define DEFAULT_DISABLE_RTCP FALSE

from 

static gboolean disable_rtcp = DEFAULT_DISABLE_RTCP;



{"disable-rtcp", '\0', 0, G_OPTION_ARG_NONE, &disable_rtcp,
  "Whether RTCP should be disabled (default false)", NULL},

 from

gst_rtsp_media_factory_set_enable_rtcp (factory, !disable_rtcp);


so now this works (to compile).
gcc test.c -o test $(pkg-config --cflags --libs gstreamer-1.0 gstreamer-rtsp-server-1.0)

ok so back to it…

root@jetson:/opt/gst-rtsp-server/examples# ./test-launch "videotestsrc ! nvvidconv ! nvv4l2h264enc ! h264parse ! rtph264pay name=pay0 pt=96"
stream ready at rtsp://127.0.0.1:8554/test

So apparently now I can run this in VLC… when I open

rtsp://<jetson-ip>:8554/test

Um is that meant to happen?…. Yes!

Ok next, we want to see SLAM stuff happening. So, ideally, a video feed of the desktop, or something like that.

So here are the links I have open. Maybe I get back to them later. Need to get back to ORBSLAM2 first, and see where we’re at, and what we need. Not quite /dev/video0 to PC client. More like, ORBSLAM2 to dev/video0 to PC client. Or full screen desktop. One way or another.

Here's a cool pdf with some instructions, from doodlelabs, and their accompanying pdf about video streaming codecs and such.

Also, gotta check out this whole related thing. and the depthnet example, whose documentation is here.

Ok, so carrying on.

I try again today, and whereas yesterday we had

 libgstrtspserver-1.0-dev is already the newest version (1.14.5-0ubuntu1~18.04.1).

Today we have

E: Unable to locate package libgstrtspserver-1.0-dev
E: Couldn't find any package by glob 'libgstrtspserver-1.0-dev'
E: Couldn't find any package by regex 'libgstrtspserver-1.0-dev'

Did I maybe compile it outside of the docker? Hmm maybe. Why can’t I find it though? Let’s try the obvious… but also why does this take so long? Network is unreachable. Network is unreachable. Where have all the mirrors gone?

apt-get update

Ok so long story short, I made another docker file. to get gstreamer installed. It mostly required adding a key for the kitware apt repo.

./test "videotestsrc ! nvvidconv ! nvv4l2h264enc ! h264parse ! rtph264pay name=pay0 pt=96"

Ok and on my linux box now, so I’ll connect to it.

sudo apt install vlc
vlc rtsp://192.168.101.115:8554/Test

K all good… So let’s get the camera output next?

sheesh it’s not obvious.

I’m just making a note of this.

Since 1.14, the use of libv4l2 has been disabled due to major bugs in the emulation layer. To enable usage of this library, set the environment variable GST_V4L2_USE_LIBV4L2=1

but it doesn’t want to work anyway. Ok RTSP is almost a dead end.

I might attach a CSI camera instead of V4L2 (USB camera) maybe. Seems less troublesome. But yeah let’s take a break. Let’s get back to depthnet and ROS2 and ORB-SLAM2, etc.

depthnet: error while loading shared libraries: /usr/lib/aarch64-linux-gnu/libnvinfer.so.8: file too short

Ok, let’s try ROS2.

(Sorry, this was supposed to be about SLAM, right?)

As a follow-up for this post…

I asked about mapping two argus (NVIDIA’s CSI camera driver) node topics, in order to fool their stereo_proc, on the github issues. No replies, cause they probably want to sell expensive stereo cameras, and I am asking how to do it with $15 Chinese cameras.

I looked at DustyNV’s Mono depth. Probably not going to work. It seems like you can get a good depth estimate for things in the scene, but everything around the edges reads as ‘close’. Not sure that’s practical enough for depth.

I looked at the NVIDIA DNN depth. Needs proper stereo cameras.

I looked at NVIDIA VPI Stereo Disparity pipeline It is the most promising yet, but the input either needs to come from calibrated cameras, or needs to be rectified on-the-fly using OpenCV. This seems like it might be possible in python, but it is not obvious yet how to do it in C++, which the rest of the code is in.

Self portrait using unusable stereo disparity data, using the c++ code in https://github.com/NVIDIA-AI-IOT/jetson-stereo-depth/

I tried calibration.

I removed the USB cameras.

I attached two RPi 2.1 CSI cameras, from older projects. Deep dived into ISAAC_ROS suite. Left ROS2 alone for a bit because it is just getting in the way. The one camera sensor had fuzzy lines going across, horizontally, occasionally, and calibration results were poor, fuzzy. Decided I needed new cameras.

IMX-219 was used by the github author, and I even printed out half of the holder, to hold the cameras 8cm apart.

I tried calibration using the ROS2 cameracalibrator, which is a wrapper for a opencv call, after starting up the camera driver node, inside the isaac ros docker.

ros2 run isaac_ros_argus_camera_mono isaac_ros_argus_camera_mono --ros-args -p device:=0 -p sensor:=4 -p output_encoding:="mono8"

(This publishes mono camera feed to topic /image_raw)

ros2 run camera_calibration cameracalibrator \
--size=8x6 \
--square=0.063 \
--approximate=0.3 \
--no-service-check \
--ros-args --remap /image:=/image_raw

(Because of bug, also sometimes need to remove –ros-args –remap )

OpenCV was able to calibrate, via the ROS2 application, in both cases. So maybe I should just grab the outputs from that. We’ll do that again, now. But I think I need to print out a chessboard and just see how that goes first.

I couldn’t get more than a couple of matches using pictures of the chessboard on the screen, even with binary thresholding, in the author’s calibration notebooks.

Here’s what the NVIDIA VPI 1.2’s samples drew, for my chess boards:

Stereo Disparity
Confidence Map

Camera calibration seems to be a serious problem, in the IOT camera world. I want something approximating depth, and it is turning out that there’s some math involved.

Learning about epipolar geometry was not something I planned to do for this.

But this is like a major showstopper, so either, I must rectify, in real time, or I must calibrate.

https://upload.wikimedia.org/wikipedia/commons/9/9a/Image_rectification.svg

We’re not going to SLAM without it.

The pertinent forum post is here.

“The reason for the noisy result is that the VPI algorithm expects the rectified image pairs as input. Please do the rectification first and then feed the rectified images into the stereo disparity estimator.”

So can we use this info? The nvidia post references this code below as the solution, perhaps, within the context of the code below. Let’s run it on the chessboard?

p1fNew = p1f.reshape((p1f.shape[0] * 2, 1))
p2fNew = p2f.reshape((p2f.shape[0] * 2, 1))

retBool ,rectmat1, rectmat2 = cv2.stereoRectifyUncalibrated(p1fNew,p2fNew,fundmat,imgsize)
import numpy as np
import cv2
import vpi

left  = cv2.imread('left.png')
right = cv2.imread('right.png')
left_gray  = cv2.cvtColor(left, cv2.COLOR_BGR2GRAY)
right_gray = cv2.cvtColor(right, cv2.COLOR_BGR2GRAY)

detector = cv2.xfeatures2d.SIFT_create()
kp1, desc1 = detector.detectAndCompute(left_gray,  None)
kp2, desc2 = detector.detectAndCompute(right_gray, None)

bf = cv2.BFMatcher()
matches = bf.knnMatch(desc1, desc2, k=2)

ratio = 0.75
good, mkp1, mkp2 = [], [], []
for m in matches:
    if m[0].distance < m[1].distance * ratio:
        m = m[0]
        good.append(m)
        mkp1.append( kp1[m.queryIdx] )
        mkp2.append( kp2[m.trainIdx] )

p1 = np.float32([kp.pt for kp in mkp1])
p2 = np.float32([kp.pt for kp in mkp2])

H, status = cv2.findHomography(p1, p2, cv2.RANSAC, 20)
print('%d / %d  inliers/matched' % (np.sum(status), len(status)))

status = np.array(status, dtype=bool)
p1f = p1[status.view(np.ndarray).ravel()==1,:] #Remove Outliers
p2f = p2[status.view(np.ndarray).ravel()==1,:] #Remove Outliers
goodf = [good[i] for i in range(len(status)) if status.view(np.ndarray).ravel()[i]==1]

fundmat, mask = cv2.findFundamentalMat(p1f, p2f, cv2.RANSAC, 3, 0.99,)

#img = cv2.drawMatches(left_gray, kp1, right_gray, kp2, good, None, None, flags=2)
#cv2.imshow('Default Matches', img)
#img = cv2.drawMatches(left_gray, kp1, right_gray, kp2, goodf, None, None, flags=2)
#cv2.imshow('Filtered Matches', img)
#cv2.waitKey(0)

retBool, H1, H2 = cv2.stereoRectifyUncalibrated(p1f, p2f, fundmat, (left.shape[1],left.shape[0]))

with vpi.Backend.CUDA:
    left = vpi.asimage(left).convert(vpi.Format.NV12_ER)
    left = left.perspwarp(H1)
    left = left.convert(vpi.Format.RGB8)

    right = vpi.asimage(right).convert(vpi.Format.NV12_ER)
    right = right.perspwarp(H2)
    right = right.convert(vpi.Format.RGB8)

#cv2.imshow('Left', left.cpu())
#cv2.imshow('Right', right.cpu())
#cv2.waitKey(0)

cv2.imwrite('rectified_left.png', left.cpu())
cv2.imwrite('rectified_right.png', right.cpu())

Categories
3D Research AI/ML CNNs deep dev envs evolution GANs Gripper Gripper Research Linux Locomotion sexing sim2real simulation The Sentient Table UI Vision

Simulation Vision

We’ve got an egg in the gym environment now, so we need to collect some data for training the robot to go pick up an egg.

I’m going to have it save the rgba, depth and segmentation images to disk for Unet training. I left out the depth image for now. The pictures don’t look useful. But some papers are using the depth, so I might reconsider. Some weed bot paper uses 14-channel images with all sorts of extra domain specific data relevant to plants.

I wrote some code to take pics if the egg was in the viewport, and it took 1000 rgb and segmentation pictures or so. I need to change the colour of the egg for sure, and probably randomize all the textures a bit. But main thing is probably to make the segmentation layers with pixel colours 0,1,2, etc. so that it detects the egg and not so much the link in the foreground.

So sigmoid to softmax and so on. Switching to multi-class also begs the question whether to switch to Pytorch & COCO panoptic segmentation based training. It will have to happen eventually, as I think all of the fastest implementations are currently in Pytorch and COCO based. Keras might work fine for multiclass or multiple binary classification, but it’s sort of the beginning attempt. Something that works. More proof of concept than final implementation. But I think Keras will be good enough for these in-simulation 256×256 images.

Regarding multi-class segmentation, karolzak says “it’s just a matter of changing num_classes argument and you would need to shape your mask in a different way (layer per class??), so for multiclass segmentation you would need a mask of shape (width, height, num_classes)

I’ll keep logging my debugging though, if you’re reading this.

So I ran segmask_linkindex.py to see what it does, and how to get more useful data. The code is not running because the segmentation image actually has an array of arrays. I presume it’s a numpy array. I think it must be the rows and columns. So anyway I added a second layer to the loop, and output the pixel values, and when I ran it in the one mode:

-1
-1
-1
83886081
obUid= 1 linkIndex= 4
83886081
obUid= 1 linkIndex= 4
1
obUid= 1 linkIndex= -1
1
obUid= 1 linkIndex= -1
16777217
obUid= 1 linkIndex= 0
16777217
obUid= 1 linkIndex= 0
-1
-1
-1

And in the other mode

-1
-1
-1
1
obUid= 1 linkIndex= -1
1
obUid= 1 linkIndex= -1
1
obUid= 1 linkIndex= -1
-1
-1
-1

Ok I see. Hmm. Well the important thing is that this code is indeed for extracting the pixel information. I think it’s going to be best for the segmentation to use the simpler segmentation mask that doesn’t track the link info. Ok so I used that code from the guy’s thesis project, and that was interpolating the numbers. When I look at the unique elements of the mask without interpolation, I’ve got…

[  0   2 255]
[  0   2 255]
[  0   2 255]
[  0   2 255]
[  0   2 255]
[  0   1   2 255]
[  0   1   2 255]
[  0   2 255]
[  0   2 255]

Ok, so I think:

255 is the sky
0 is the plane
2 is the robotable
1 is the egg

So yeah, I was just confused because the segmentation masks were all black and white. But if you look closely with a pixel picker tool, the pixel values are (0,0,0), (1,1,1), (2,2,2), (255,255,255), so I just couldn’t see it.

The interpolation kinda helps, to be honest.

As per OpenAI’s domain randomization helping with Sim2Real, we want to randomize some textures and some other things like that. I also want to throw in some random chickens. Maybe some cats and dogs. I’m afraid of transfer learning, at this stage, because a lot of it has to do with changing the structure of the final layer of the neural network, and that might be tough. Let’s just do chickens and eggs.

An excerpt from OpenAI:

Costs

Both techniques increase the computational requirements: dynamics randomization slows training down by a factor of 3x, while learning from images rather than states is about 5-10x slower.

Ok that’s a bit more complex than I was thinking. I want to randomize textures and colours, first

I’ve downloaded and unzipped the ‘Describable Textures Dataset’

And ok it’s loading a random texture for the plane

and random colour for the egg and chicken

Ok, next thing is the Simulation CNN.

Interpolation doesn’t work though, for this, cause it interpolates from what’s available in the image:

[  0  85 170 255]
[  0  63 127 191 255]
[  0  63 127 191 255]

I kind of need the basic UID segmentation.

[  0   1   2   3 255]

Ok, pity about the mask colours, but anyway.

Let’s train the UNet on the new dataset.

We’ll need to make karolzak’s changes.

I’ve saved 2000+ rgb.jpg and seg.png files and we’ve got [0,1,2,3,255] [plane, egg, robot, chicken, sky]

So num_classes=5

And

“for multiclass segmentation you would need a mask of shape (width, height, num_classes) “

What is y.shape?

(2001, 256, 256, 1)

which is 2001 files, of 256 x 256 pixels, and one class. So if I change that to 5…? ValueError: cannot reshape array of size 131137536 into shape (2001,256,256,5)

Um… Ok I need to do more research. Brb.

So the keras_unet library is set up to input binary masks per class, and output binary masks per class.

I would rather use the ‘integer’ class output, and have it output a single array, with the class id per pixel. Similar to this question. In preparation for karolzak probably not knowing how to do this with his library, I’ve asked on stackoverflow for an elegant way to make the binary masks from a multi-class mask, in the meantime.

I coded it up using the library author’s suggested method, as he pointed out that the gains of the integer encoding method are minimal. I’ll check it out another time. I think it might still make sense for certain cases.

Ok that’s pretty awesome. We have 4 masks. Human, chicken, egg, robot. I left out plane and sky for now. That was just 2000 images of training, and I have 20000. I trained on another 2000 images, and it’s down to 0.008 validation loss, which is good enough!

So now I want to load the CNN model in the locomotion code, and feed it the images from the camera, and then have a reward function related to maximizing the egg pixels.

I also need to look at the pybullet-planning project and see what it consists of, as I imagine they’ve made some progress on the next steps. “built-in implementations of standard motion planners, including PRM, RRT, biRRT, A* etc.” – I haven’t even come across these acronyms yet! Ok, they are motion planning. Solvers of some sort. Hmm.

Categories
UI

StreamLit

https://www.streamlit.io/ “Streamlit is an open-source Python library that makes it easy to build beautiful custom web-apps for machine learning and data science.”

Looks like maybe a sort of Jupyter Notebook player with a better UI.

We ended up using this to plot and average the motor angles/velocities