We’ve gone a totally different way, but this is another interesting project from Erwin Coumans, on the Google Brain team, who did PyBullet. NeuralSim replaces parts of physics engines with neural networks.
Though I’m generally using stable baseline algorithms for training locomotion tasks, I am sometimes drawn back to evolutionary algorithms, and especially Map Elites, which has now been upgraded to incorporate a policy gradient.
The archiving of behaviours is what attracts me to Map Elites.
PGA Map Elites based on top of QDGym, which tracks Quality Diversity, is probably worth a look.
Beautiful. Since we’re currently just using a 256×256 view port in pybullet, this is quite a bit more advanced than required though. Learning game engines can also take a while. It took me about a month to learn Unity3d, with intermediate C# experience. Unreal Engine uses C++, so it’s a bit less accessible to beginners.
I’ve now got a UNet that can provide predictions for where an egg is, in simulation.
So I want to design a reward function related to the egg prediction mask.
I haven’t ‘plugged in’ the trained neural network though, because it will slow things down, and I can just as well make use of the built-in pybullet segmentation to get the simulation egg pixels. At some point though, the robot will have to exist in a world where egg pixels are not labelled as such, and the simulation trained vision will be a useful basis for training.
I think a good reward function might be, (to not fall over), and to maximize the number of 1s for the egg prediction mask. An intermediate award might be the centering of egg pixels.
The numpy way to count mask pixels could be
arr = np.array([1, 0, 0, 0, 0, 1, 1, 1, 1, 0])
np.count_nonzero(arr == 1)
I ended up using the following to count the pixels:
seg = Image.fromarray(mask.astype('uint8'))
self._num_ones = (np.array(seg) == 1).sum()
Hmm for centering, not sure yet.
I’m looking into how to run pybullet / gym on the cloud and get some of it rendering.
I’ve found a few leads. VNC is an obvious solution, but probably won’t be available on Chrome OS. Pybullet has a broken link, but I think it’s suggesting something like this colab, more or less, using ‘pyrender’. User matpalm has a minimal example of sending images to Google Dataflow. Those might be good if I can render video. There’s a Jupyter example with capturing images in pybullet. I’ll have to research a bit more. An RDP viewer would probably be easiest, if it’s possible.
I set up the Ray Tune training again, on google cloud, and enabled the dashboard by opening some ports (8265, and 6006), and initialising ray with ray.init(dashboard_host=”0.0.0.0″)
I can see it improving the episode reward mean, but it’s taking a good while on the 4 CPU cloud machine. Cost is about $3.50/day on the CPU machine, and about $16/day on the GPU machine. Google is out of T4 GPUs at the moment.
I have it saving the occasional mp4 video using a Monitor wrapper that records every 10th episode.
Continuing from our early notes on SLAM algorithms (Simultaneous Localisation and Mapping), and the similar but not as map-making, DSO algorithm, I came across a good project (“From cups to consciousness“) and article that reminded me that mapping the environment or at least having some sense of depth, will be pretty crucial.
At the moment I’ve just got to the point of thinking to train a CNN on simulation data, and so there should also be some positioning of the robot as a model in it’s own virtual world. So it’s probably best to reexamine what’s already out there. Visual odometry. Optical Flow.
I found a good paper summarizing 2019 options. The author’s github has some interesting scripts that might be useful. It reminds me that I should probably be using ROS and gazebo, to some extent. The conclusion was roughly that Google Cartographer or GMapping (Open SLAM) are generally beating some other ones, Karto, Hector. Seems like SLAM code is all a few years old. Google Cartographer had some support for ‘lifelong mapping‘, which sounded interesting. The robot goes around updating its map, a bit. It reminds me I saw ‘PonderNet‘ today, fresh from DeepMind, which from a quick look is, more or less, about like scaling your workload down to your input size.
Anyway, we are mostly interested in Monocular SLAM. So none of this applies, probably. I’m mostly interested at the moment, in using some prefab scenes like the AI2Thor environment in the Cups-RL example, and making some sort of SLAM in simulation.
Also interesting is RATSLAM and the recent update: LatentSLAM – The authors of this site, The Smart Robot, got my attention because of the CCNs. Cortical column networks.
“A common shortcoming of RatSLAM is its sensitivity to perceptual aliasing, in part due to the reliance on an engineered visual processing pipeline. We aim to reduce the effects of perceptual aliasing by replacing the perception module by a learned dynamics model. We create a generative model that is able to encode sensory observations into a latent code that can be used as a replacement to the visual input of the RatSLAM system”
Interesting, “The robot performed 1,143 delivery tasks to 11 different locations with only one delivery failure (from which it recovered), traveled a total distance of more than 40 km over 37 hours of active operation, and recharged autonomously a total of 23 times.“
I think DSO might be a good option, or the closed loop, LDSO, look like the most straight-forward, maybe.
After a weekend away with a computer vision professional, I found out about COLMAP, a structure from movement suite.
I saw a few more recent projects too, e.g. NeuralRecon, and
ooh, here’s a recent facebook one that sounds like it might work!
Consistent Depth … eh, their google colab is totally broken.
Anyhow, LDSO. Let’s try it.
In file included from /dmc/LDSO/include/internal/OptimizationBackend/AccumulatedTopHessian.h:10:0, from /dmc/LDSO/include/internal/OptimizationBackend/EnergyFunctional.h:9, from /dmc/LDSO/include/frontend/FeatureMatcher.h:10, from /dmc/LDSO/include/frontend/FullSystem.h:18, from /dmc/LDSO/src/Map.cc:4: /dmc/LDSO/include/internal/OptimizationBackend/MatrixAccumulators.h:8:10: fatal error: SSE2NEON.h: No such file or directory #include "SSE2NEON.h" ^~~~ compilation terminated. src/CMakeFiles/ldso.dir/build.make:182: recipe for target 'src/CMakeFiles/ldso.dir/Map.cc.o' failed make[2]: *** [src/CMakeFiles/ldso.dir/Map.cc.o] Error 1 make[2]: *** Waiting for unfinished jobs…. CMakeFiles/Makefile2:85: recipe for target 'src/CMakeFiles/ldso.dir/all' failed make[1]: *** [src/CMakeFiles/ldso.dir/all] Error 2 Makefile:83: recipe for target 'all' failed make: *** [all] Error 2
Ok maybe not.
There’s a paper here reviewing ORBSLAM3 and LDSO, and they encounter lots of issues. But it’s a good paper for an overview of how the algorithms work. We want a point cloud so we can find the closest points, and not walk into them.
Calibration is an issue, rolling shutter cameras are an issue, IMU data can’t be synced meaningfully, it’s a bit of a mess, really.
Also, reports that ORB-SLAM2 was only getting 5 fps on a raspberry pi, I got smart, and looked for something specifically for the jetson. I found a depth CNN for monocular vision on the forum, amazing.
Ok so after much fussing about, I found just what we need. I had an old copy of jetson-containers, and the slam code was added just 6 months ago. I might want to try the noetic one (ROS2) instead of ROS, good old ROS.
git clone https://github.com/dusty-nv/jetson-containers.git
cd jetson-containers
chicken@jetson:~/jetson-containers$ ./scripts/docker_build_ros.sh --distro melodic --with-slam
Successfully built 2eb4d9c158b0
Successfully tagged ros:melodic-ros-base-l4t-r32.5.0
chicken@jetson:~/jetson-containers$ ./scripts/docker_test_ros.sh melodic
reading L4T version from /etc/nv_tegra_release
L4T BSP Version: L4T R32.5.0
l4t-base image: nvcr.io/nvidia/l4t-base:r32.5.0
testing container ros:melodic-ros-base-l4t-r32.5.0 => ros_version
xhost: unable to open display ""
xauth: file /tmp/.docker.xauth does not exist
sourcing /opt/ros/melodic/setup.bash
ROS_ROOT /opt/ros/melodic/share/ros
ROS_DISTRO melodic
getting ROS version -
melodic
done testing container ros:melodic-ros-base-l4t-r32.5.0 => ros_version
Well other than the X display, looking good.
Maybe I should just plug in a monitor. Ideally I wouldn’t have to, though. I used GStreamer the other time. Maybe we do that again.
This looks good too… https://github.com/dusty-nv/ros_deep_learning but let’s stay focused. I’m also thinking maybe we upgrade early, to noetic. Ugh it looks like a whole new bunch of build tools and things to relearn. I’m sure it’s amazing. Let’s do ROS1, for now.
Let’s try build that FCNN one again.
CMake Error at tx2_fcnn_node/Thirdparty/fcrn-inference/CMakeLists.txt:121 (find_package):
By not providing "FindOpenCV.cmake" in CMAKE_MODULE_PATH this project has
asked CMake to find a package configuration file provided by "OpenCV", but
CMake did not find one.
Could not find a package configuration file provided by "OpenCV" (requested
version 3.0.0) with any of the following names:
OpenCVConfig.cmake
opencv-config.cmake
Add the installation prefix of "OpenCV" to CMAKE_PREFIX_PATH or set
"OpenCV_DIR" to a directory containing one of the above files. If "OpenCV"
provides a separate development package or SDK, be sure it has been
installed.
-- Configuring incomplete, errors occurred!
Ok hold on…
Builds additional container with VSLAM packages,
including ORBSLAM2, RTABMAP, ZED, and Realsense.
This only applies to foxy and galactic and implies
--with-pytorch as these containers use PyTorch."
Ok that hangs when it starts building the slam bits. Luckily, someone’s raised the bug, and though it’s not fixed, Dusty does have a docker already compiled.
So, after some digging, I think we can solve the X problem (i.e. where are we going to see this alleged SLAMming occur?) with an RTSP server. Previously I used GStreamer to send RTP over UDP. But this makes more sense, to run a server on the Jetson. There’s a plugin for GStreamer, so I’m trying to get the ‘dev’ version, so I can compile the test-launch.c program.
apt-get install libgstrtspserver-1.0-dev
Reading package lists... Done
Building dependency tree
Reading state information... Done
libgstrtspserver-1.0-dev is already the newest version (1.14.5-0ubuntu1~18.04.1).
ok... git clone https://github.com/GStreamer/gst-rtsp-server.git
root@jetson:/opt/gst-rtsp-server/examples# gcc test-launch.c -o test-launch $(pkg-config --cflags --libs gstreamer-1.0 gstreamer-rtsp-server-1.0)
test-launch.c: In function ‘main’:
test-launch.c:77:3: warning: implicit declaration of function ‘gst_rtsp_media_factory_set_enable_rtcp’; did you mean ‘gst_rtsp_media_factory_set_latency’? [-Wimplicit-function-declaration]
gst_rtsp_media_factory_set_enable_rtcp (factory, !disable_rtcp);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
gst_rtsp_media_factory_set_latency
/tmp/ccC1QgPA.o: In function `main':
test-launch.c:(.text+0x154): undefined reference to `gst_rtsp_media_factory_set_enable_rtcp'
collect2: error: ld returned 1 exit status
gst_rtsp_media_factory_set_enable_rtcp
Ok wait let’s reinstall gstreamer.
apt-get install libgstreamer1.0-dev libgstreamer-plugins-base1.0-dev libgstreamer-plugins-bad1.0-dev gstreamer1.0-plugins-base gstreamer1.0-plugins-good gstreamer1.0-plugins-bad gstreamer1.0-plugins-ugly gstreamer1.0-libav gstreamer1.0-doc gstreamer1.0-tools gstreamer1.0-x gstreamer1.0-alsa gstreamer1.0-gl gstreamer1.0-gtk3 gstreamer1.0-qt5 gstreamer1.0-pulseaudio
error...
Unpacking libgstreamer-plugins-bad1.0-dev:arm64 (1.14.5-0ubuntu1~18.04.1) ...
Errors were encountered while processing:
/tmp/apt-dpkg-install-Ec7eDq/62-libopencv-dev_3.2.0+dfsg-4ubuntu0.1_arm64.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)
Ok then leave out that one...
apt --fix-broken install
and that fails on
Errors were encountered while processing:
/var/cache/apt/archives/libopencv-dev_3.2.0+dfsg-4ubuntu0.1_arm64.deb
It’s like a sign of being a good programmer, to solve this stuff. But damn. Every time. Suggestions continue, in the forums of those who came before. Let’s reload the docker.
Ok I took a break and got lucky. The test-launch.c code is different from what the admin had.
Let’s diff it and see what changed…
#define DEFAULT_DISABLE_RTCP FALSE
from
static gboolean disable_rtcp = DEFAULT_DISABLE_RTCP;
{"disable-rtcp", '\0', 0, G_OPTION_ARG_NONE, &disable_rtcp,
"Whether RTCP should be disabled (default false)", NULL},
from
gst_rtsp_media_factory_set_enable_rtcp (factory, !disable_rtcp);
so now this works (to compile).
gcc test.c -o test $(pkg-config --cflags --libs gstreamer-1.0 gstreamer-rtsp-server-1.0)
So apparently now I can run this in VLC… when I open
rtsp://<jetson-ip>:8554/test
Um is that meant to happen?…. Yes!
Ok next, we want to see SLAM stuff happening. So, ideally, a video feed of the desktop, or something like that.
So hereare the links I have open. Maybe I get back to them later. Need to get back to ORBSLAM2 first, and see where we’re at, and what we need. Not quite /dev/video0 to PC client. More like, ORBSLAM2 to dev/video0 to PC client. Or full screen desktop. One way or another.
libgstrtspserver-1.0-dev is already the newest version (1.14.5-0ubuntu1~18.04.1).
Today we have
E: Unable to locate package libgstrtspserver-1.0-dev E: Couldn't find any package by glob 'libgstrtspserver-1.0-dev' E: Couldn't find any package by regex 'libgstrtspserver-1.0-dev'
Did I maybe compile it outside of the docker? Hmm maybe. Why can’t I find it though? Let’s try the obvious… but also why does this take so long? Network is unreachable. Network is unreachable. Where have all the mirrors gone?
apt-get update
Ok so long story short, I made another docker file. to get gstreamer installed. It mostly required adding a key for the kitware apt repo.
Since 1.14, the use of libv4l2 has been disabled due to major bugs in the emulation layer. To enable usage of this library, set the environment variable GST_V4L2_USE_LIBV4L2=1
but it doesn’t want to work anyway. Ok RTSP is almost a dead end.
I might attach a CSI camera instead of V4L2 (USB camera) maybe. Seems less troublesome. But yeah let’s take a break. Let’s get back to depthnet and ROS2 and ORB-SLAM2, etc.
depthnet: error while loading shared libraries: /usr/lib/aarch64-linux-gnu/libnvinfer.so.8: file too short
Ok, let’s try ROS2.
(Sorry, this was supposed to be about SLAM, right?)
As a follow-up for this post…
I asked about mapping two argus (NVIDIA’s CSI camera driver) node topics, in order to fool their stereo_proc, on the github issues. No replies, cause they probably want to sell expensive stereo cameras, and I am asking how to do it with $15 Chinese cameras.
I looked at DustyNV’s Mono depth. Probably not going to work. It seems like you can get a good depth estimate for things in the scene, but everything around the edges reads as ‘close’. Not sure that’s practical enough for depth.
I looked at the NVIDIA DNN depth. Needs proper stereo cameras.
I looked at NVIDIA VPI Stereo Disparity pipeline It is the most promising yet, but the input either needs to come from calibrated cameras, or needs to be rectified on-the-fly using OpenCV. This seems like it might be possible in python, but it is not obvious yet how to do it in C++, which the rest of the code is in.
I tried calibration.
I removed the USB cameras.
I attached two RPi 2.1 CSI cameras, from older projects. Deep dived into ISAAC_ROS suite. Left ROS2 alone for a bit because it is just getting in the way. The one camera sensor had fuzzy lines going across, horizontally, occasionally, and calibration results were poor, fuzzy. Decided I needed new cameras.
IMX-219 was used by the github author, and I even printed out half of the holder, to hold the cameras 8cm apart.
I tried calibration using the ROS2 cameracalibrator, which is a wrapper for a opencv call, after starting up the camera driver node, inside the isaac ros docker.
(Because of bug, also sometimes need to remove –ros-args –remap )
OpenCV was able to calibrate, via the ROS2 application, in both cases. So maybe I should just grab the outputs from that. We’ll do that again, now. But I think I need to print out a chessboard and just see how that goes first.
I couldn’t get more than a couple of matches using pictures of the chessboard on the screen, even with binary thresholding, in the author’s calibration notebooks.
Here’s what the NVIDIA VPI 1.2’s samples drew, for my chess boards:
Camera calibration seems to be a serious problem, in the IOT camera world. I want something approximating depth, and it is turning out that there’s some math involved.
Learning about epipolar geometry was not something I planned to do for this.
But this is like a major showstopper, so either, I must rectify, in real time, or I must calibrate.
“The reason for the noisy result is that the VPI algorithm expects the rectified image pairs as input. Please do the rectification first and then feed the rectified images into the stereo disparity estimator.”
So can we use this info? The nvidia post references this code below as the solution, perhaps, within the context of the code below. Let’s run it on the chessboard?
We’ve got an egg in the gym environment now, so we need to collect some data for training the robot to go pick up an egg.
I’m going to have it save the rgba, depth and segmentation images to disk for Unet training. I left out the depth image for now. The pictures don’t look useful. But some papers are using the depth, so I might reconsider. Some weed bot paper uses 14-channel images with all sorts of extra domain specific data relevant to plants.
I wrote some code to take pics if the egg was in the viewport, and it took 1000 rgb and segmentation pictures or so. I need to change the colour of the egg for sure, and probably randomize all the textures a bit. But main thing is probably to make the segmentation layers with pixel colours 0,1,2, etc. so that it detects the egg and not so much the link in the foreground.
So sigmoid to softmax and so on. Switching to multi-class also begs the question whether to switch to Pytorch & COCO panoptic segmentation based training. It will have to happen eventually, as I think all of the fastest implementations are currently in Pytorch and COCO based. Keras might work fine for multiclass or multiple binary classification, but it’s sort of the beginning attempt. Something that works. More proof of concept than final implementation. But I think Keras will be good enough for these in-simulation 256×256 images.
Regarding multi-class segmentation, karolzak says “it’s just a matter of changing num_classes argument and you would need to shape your mask in a different way (layer per class??), so for multiclass segmentation you would need a mask of shape (width, height, num_classes)“
I’ll keep logging my debugging though, if you’re reading this.
So I ran segmask_linkindex.py to see what it does, and how to get more useful data. The code is not running because the segmentation image actually has an array of arrays. I presume it’s a numpy array. I think it must be the rows and columns. So anyway I added a second layer to the loop, and output the pixel values, and when I ran it in the one mode:
Ok I see. Hmm. Well the important thing is that this code is indeed for extracting the pixel information. I think it’s going to be best for the segmentation to use the simpler segmentation mask that doesn’t track the link info. Ok so I used that code from the guy’s thesis project, and that was interpolating the numbers. When I look at the unique elements of the mask without interpolation, I’ve got…
[ 0 2 255]
[ 0 2 255]
[ 0 2 255]
[ 0 2 255]
[ 0 2 255]
[ 0 1 2 255]
[ 0 1 2 255]
[ 0 2 255]
[ 0 2 255]
Ok, so I think:
255 is the sky
0 is the plane
2 is the robotable
1 is the egg
So yeah, I was just confused because the segmentation masks were all black and white. But if you look closely with a pixel picker tool, the pixel values are (0,0,0), (1,1,1), (2,2,2), (255,255,255), so I just couldn’t see it.
The interpolation kinda helps, to be honest.
As per OpenAI’s domain randomization helping with Sim2Real, we want to randomize some textures and some other things like that. I also want to throw in some random chickens. Maybe some cats and dogs. I’m afraid of transfer learning, at this stage, because a lot of it has to do with changing the structure of the final layer of the neural network, and that might be tough. Let’s just do chickens and eggs.
Both techniques increase the computational requirements: dynamics randomization slows training down by a factor of 3x, while learning from images rather than states is about 5-10x slower.
Ok that’s a bit more complex than I was thinking. I want to randomize textures and colours, first
I’ve downloaded and unzipped the ‘Describable Textures Dataset’
And ok it’s loading a random texture for the plane
and random colour for the egg and chicken
Ok, next thing is the Simulation CNN.
Interpolation doesn’t work though, for this, cause it interpolates from what’s available in the image:
[ 0 85 170 255]
[ 0 63 127 191 255]
[ 0 63 127 191 255]
I kind of need the basic UID segmentation.
[ 0 1 2 3 255]
Ok, pity about the mask colours, but anyway.
Let’s train the UNet on the new dataset.
We’ll need to make karolzak’s changes.
I’ve saved 2000+ rgb.jpg and seg.png files and we’ve got [0,1,2,3,255] [plane, egg, robot, chicken, sky]
So num_classes=5
And
“for multiclass segmentation you would need a mask of shape (width, height, num_classes) “
What is y.shape?
(2001, 256, 256, 1)
which is 2001 files, of 256 x 256 pixels, and one class. So if I change that to 5…? ValueError: cannot reshape array of size 131137536 into shape (2001,256,256,5)
Um… Ok I need to do more research. Brb.
So the keras_unet library is set up to input binary masks per class, and output binary masks per class.
I coded it up using the library author’s suggested method, as he pointed out that the gains of the integer encoding method are minimal. I’ll check it out another time. I think it might still make sense for certain cases.
Ok that’s pretty awesome. We have 4 masks. Human, chicken, egg, robot. I left out plane and sky for now. That was just 2000 images of training, and I have 20000. I trained on another 2000 images, and it’s down to 0.008 validation loss, which is good enough!
So now I want to load the CNN model in the locomotion code, and feed it the images from the camera, and then have a reward function related to maximizing the egg pixels.
I also need to look at the pybullet-planning project and see what it consists of, as I imagine they’ve made some progress on the next steps. “built-in implementations of standard motion planners, including PRM, RRT, biRRT, A* etc.” – I haven’t even come across these acronyms yet! Ok, they are motion planning. Solvers of some sort. Hmm.
“Meta-World is an open-source simulated benchmark for meta-reinforcement learning and multi-task learning consisting of 50 distinct robotic manipulation tasks. We aim to provide task distributions that are sufficiently broad to evaluate meta-RL algorithms’ generalization ability to new behaviors.”
The Python library located in animalai extends ml-agents v0.15.0. Mainly, we add the possibility to change the configuration of arenas between episodes.”
They had a competition of ‘animal AIs’ in 2019, using EvalAI:
EvalAI
The competition was kindly hosted on EvalAI, an open source web application for AI competitions. Special thanks to Rishabh Jain for his help in setting this up. We will aim to reopen submissions with new hidden files in order to keep some form of competition going.
A flexible, high-performance 3D simulator with configurable agents, multiple sensors, and generic 3D dataset handling (with built-in support for MatterPort3D, Gibson, Replica, and other datasets
I’m sticking to the google protobuf code that the minitaur uses, and will just need to save the best episodes, and work out how to replay them. The comments ask “use recordio?”