For the MFRU exhibition, we presented a variety of robots. The following is some documentation, on the specifications, and setup instructions. We are leaving the robots with konS.
All Robots
Li-Po batteries need to be stored at 3.8V per cell. For exhibition, they can be charged to 4.15A per cell, and run with a battery level monitor until they display 3.7V, at which point they should be swapped out. Future iterations of robotic projects will make use of splitter cables to allow hot swapping batteries, for zero downtime.
We leave our ISDT D2 Mark 2 charger, for maintaining and charging Li-Po batteries.
At setup time, in a new location, Raspberry Pi SD cards need to be updated to connect to the new Wi-fi network. Simplest method is to physically place the SD card in a laptop, and transfer a wpa_supplicant.conf file with the below changed to the new credentials and locale, and a blank file called ssh, to allow remote login.
Then following startup with the updated SD card, robot IP addresses need to be determined, typically using `nmap -sP 192.168.xxx.xxx`, (or a windows client like ZenMap).
Usernames and passwords used are:
LiDARbot – pi/raspberry
Vacuumbot – pi/raspberry and chicken/chicken
Pinkbot – pi/raspberry
Gripperbot – pi/raspberry
Birdbot – daniel/daniel
Nipplebot – just arduino
Lightswitchbot – just arduino and analog timer
For now, it is advised to shut down robots by connecting to their IP address, and typing sudo shutdown -H now and waiting for the lights to turn off, before unplugging. It’s not 100% necessary, but it reduces the chances that the apt cache becomes corrupted, and you need to reflash the SD card and start from scratch.
Starting from scratch involves reflashing the SD card using Raspberry Pi Imager, cloning the git repository, running pi_boot.sh and pip3 install -y requirements.txt, configuring config.py, and running create_service.sh to automate the startup.
LiDARbot
Raspberry Pi Zero W x 1 PCA9685 PWM controller x 1 RPLidar A1M8 x 1 FT5835M servo x 4
Powered by: Standard 5V Power bank [10Ah and 20Ah]
Startup Instructions: – Plug in USB cables. – Wait for service startup and go to URL. – If Lidar chart is displaying, click ‘Turn on Brain’
Vacuumbot
Raspberry Pi 3b x 1 LM2596 stepdown converter x 1 RDS60 servo x 4
Powered by: 7.4V 4Ah Li-Po battery
NVIDIA Jetson NX x 1 Realsense D455 depth camera x 1
Powered by: 11.1V 4Ah Li-Po battery
Instructions: – Plug Jetson assembly connector into 11.4V, and RPi assembly connector into 7.4V – Connect to Jetson:
cd ~/jetson-server
python3 inference_server.py
– Go to the Jetson URL to view depth and object detection. – Wait for Rpi service to start up. – Connect to RPi URL, and click ‘Turn on Brain’
Pinkbot
Raspberry Pi Zero W x 1 PCA9685 PWM controller x 1 LM2596 stepdown converter x 1 RDS60 servo x 8 Ultrasonic sensors x 3
Powered by: 7.4V 6.8Ah Li-Po battery
Instructions: – Plug in to Li-Po battery – Wait for Rpi service to start up. – Connect to RPi URL, and click ‘Turn On Brain’
Gripperbot
Raspberry Pi Zero W x 1 150W stepdown converter (to 7.4V) x 1 LM2596 stepdown converter (to 5V) x 1 RDS60 servo x 4 MGGR996 servo x 1
Powered by: 12V 60W power supply
Instructions: – Plug in to wall – Wait for Rpi service to start up. – Connect to RPi URL, and click ‘Fidget to the Waves’
Birdbot
Raspberry Pi Zero W x 1 FT SM-85CL-C001 servo x 4 FE-URT-1 serial controller x 1 12V input step-down converter (to 5V) x 1 Ultrasonic sensor x 1 RPi camera v2.1 x 1
Powered by: 12V 60W power supply
Instructions: – Plug in to wall – Wait for Rpi service to start up. – Connect to RPi URL, and click ‘Fidget to the Waves’
I got the Feetech Smart Bus servos running on the RPi. Using them for the birdbot.
Some gotchas:
Need to wire TX to TX, RX to RX.
Despite claiming 1000000 baudrate, 115200 was required, or it says ‘There is no status packet!”
After only one servo working, for a while, I found their FAQ #5, and installed their debugging software, and plugged each servo in individually, and changed their IDs to 1/2/3/4. It was only running the first one because all of their IDs were still 1.
For python, you need to pip3 install pyserial, and then import serial.
A seemingly straightforward idea for robot control involves using depth, and object detection, to form a rough model of the environment.
After failed attempts to create our own stereo camera using two monocular cameras, we eventually decided to buy a commercial product instead, The Intel depth camera, D455.
After a first round of running a COCO trained MobileSSDv2 object detection network in Tensorflow 2 Lite, on the colour images obtained from the realsense camera, on the Jetson Nano, the results were just barely acceptable (~2 FPS) for a localhost stream, and totally unacceptable (~0.25 FPS) served as JPEG over HTTP, to a browser on the network.
Looking at the options, the only feasible solution was to redo the network using TensorRT, the NVIDIA-specific, quantized (16 bit on the Nano, 8 bit on the NX/AGX) neural network framework. The other solution involved investigating options other than simple JPEG compression over HTTP, such as RTSP and WebRTC.
The difficult part was setting up the environment. We used the NVIDIA detectnet code, adapted to take the realsense camera images as input, and to display the distance to the objects. An outdated example was found at CAVEDU robotics blog/github. Fixed up below.
#!/usr/bin/python3
import jetson_inference
import jetson_utils
import argparse
import sys
import os
import cv2
import re
import numpy as np
import io
import time
import json
import random
import pyrealsense2 as rs
from jetson_inference import detectNet
from jetson_utils import videoSource, videoOutput, logUsage, cudaFromNumpy, cudaAllocMapped, cudaConvertColor
parser = argparse.ArgumentParser(description="Locate objects in a live camera stream using an object detection DNN.",
formatter_class=argparse.RawTextHelpFormatter, epilog=jetson_utils.logUsage())
parser.add_argument("--network", type=str, default="ssd-mobilenet-v2",
help="pre-trained model to load (see below for options)")
parser.add_argument("--threshold", type=float, default=0.5,
help="minimum detection threshold to use")
parser.add_argument("--width", type=int, default=640,
help="set width for image")
parser.add_argument("--height", type=int, default=480,
help="set height for image")
opt = parser.parse_known_args()[0]
# load the object detection network
net = detectNet(opt.network, sys.argv, opt.threshold)
# Configure depth and color streams
pipeline = rs.pipeline()
config = rs.config()
config.enable_stream(rs.stream.depth, opt.width, opt.height, rs.format.z16, 30)
config.enable_stream(rs.stream.color, opt.width, opt.height, rs.format.bgr8, 30)
# Start streaming
pipeline.start(config)
press_key = 0
while (press_key==0):
# Wait for a coherent pair of frames: depth and color
frames = pipeline.wait_for_frames()
depth_frame = frames.get_depth_frame()
color_frame = frames.get_color_frame()
if not depth_frame or not color_frame:
continue
# Convert images to numpy arrays
depth_image = np.asanyarray(depth_frame.get_data())
show_img = np.asanyarray(color_frame.get_data())
# convert to CUDA (cv2 images are numpy arrays, in BGR format)
bgr_img = cudaFromNumpy(show_img, isBGR=True)
# convert from BGR -> RGB
img = cudaAllocMapped(width=bgr_img.width,height=bgr_img.height,format='rgb8')
cudaConvertColor(bgr_img, img)
# detect objects in the image (with overlay)
detections = net.Detect(img)
for num in range(len(detections)) :
score = round(detections[num].Confidence,2)
box_top=int(detections[num].Top)
box_left=int(detections[num].Left)
box_bottom=int(detections[num].Bottom)
box_right=int(detections[num].Right)
box_center=detections[num].Center
label_name = net.GetClassDesc(detections[num].ClassID)
point_distance=0.0
for i in range (10):
point_distance = point_distance + depth_frame.get_distance(int(box_center[0]),int(box_center[1]))
point_distance = np.round(point_distance / 10, 3)
distance_text = str(point_distance) + 'm'
cv2.rectangle(show_img,(box_left,box_top),(box_right,box_bottom),(255,0,0),2)
cv2.line(show_img,
(int(box_center[0])-10, int(box_center[1])),
(int(box_center[0]+10), int(box_center[1])),
(0, 255, 255), 3)
cv2.line(show_img,
(int(box_center[0]), int(box_center[1]-10)),
(int(box_center[0]), int(box_center[1]+10)),
(0, 255, 255), 3)
cv2.putText(show_img,
label_name + ' ' + distance_text,
(box_left+5,box_top+20),cv2.FONT_HERSHEY_SIMPLEX,0.4,
(0,255,255),1,cv2.LINE_AA)
cv2.putText(show_img,
"{:.0f} FPS".format(net.GetNetworkFPS()),
(int(opt.width*0.8), int(opt.height*0.1)),
cv2.FONT_HERSHEY_SIMPLEX,1,
(0,255,255),2,cv2.LINE_AA)
display = cv2.resize(show_img,(int(opt.width*1.5),int(opt.height*1.5)))
cv2.imshow('Detecting...',display)
keyValue=cv2.waitKey(1)
if keyValue & 0xFF == ord('q'):
press_key=1
cv2.destroyAllWindows()
pipeline.stop()
Assuming you have a good cmake version and cuda is available (if nvcc doesn’t work, you need to configure linker paths, check this link)… (if you have a cmake version around ~ 3.22-3.24 or so, you need an older one)… prerequisite sudo apt-get install libssl-dev also required.
The hard part was actually setting up the Realsense python bindings.
The trick being, to request the python bindings, and cuda, during the cmake phase. Note that often, none of this works. Some tips include…
sudo apt-get install xorg-dev libglu1-mesa-dev
and changing PYTHON to Python
mkdir build
cd build
cmake ../ -DBUILD_PYTHON_BINDINGS:bool=true -DPYTHON_EXECUTABLE=/usr/bin/python3 -DCMAKE_BUILD_TYPE=release -DBUILD_EXAMPLES=true -DBUILD_GRAPHICAL_EXAMPLES=true -DBUILD_WITH_CUDA:bool=true
The above worked on Jetpack 4.6.1, while the below worked on Jetpack 5.0.2
cmake ../ -DBUILD_PYTHON_BINDINGS:bool=true -DPython_EXECUTABLE=/usr/bin/python3.8 -DCMAKE_BUILD_TYPE=release -DBUILD_EXAMPLES=true -DBUILD_GRAPHICAL_EXAMPLES=true -DBUILD_WITH_CUDA:bool=true -DPYTHON_INCLUDE_DIRS=/usr/include/python3.8 -DPython_LIBRARIES=/usr/lib/aarch64-linux-gnu/libpython3.8.so
(and sudo make install)
Update the python path
export PYTHONPATH=$PYTHONPATH:/usr/local/lib
(or a specific python if you have more than one)
if installed in /usr/lib, change accordingly
Check that the folder is in the correct location (it isn't, after following official instructions).
./usr/local/lib/python3.6/dist-packages/pyrealsense2/
Check that the shared object files (.so) are in the right place:
chicken@chicken:/usr/local/lib$ ls
cmake libjetson-inference.so librealsense2-gl.so.2.50 librealsense2.so.2.50 pkgconfig
libfw.a libjetson-utils.so librealsense2-gl.so.2.50.0 librealsense2.so.2.50.0 python2.7
libglfw3.a librealsense2-gl.so librealsense2.so librealsense-file.a python3.6
If it can't find 'pipeline', it means you need to copy the missing __init__.py file.
sudo cp ./home/chicken/librealsense/wrappers/python/pyrealsense2/__init__.py ./usr/local/lib/python3.6/dist-packages/pyrealsense2/
Some extra things to do,
sudo cp 99-realsense-libusb.rules /etc/udev/rules.d/
Eventually, I am able to run the inference on the realsense camera, at an apparent 25 FPS, on the localhost, drawing to an OpenGL window.
I also developed a Dockerfile, for the purpose, which benefits from an updated Pytorch version, but various issues were encountered, making a bare-metal install far simpler, ultimately. Note that building jetson-inference, and Realsense SDK on the Nano require increasing your swap size, beyond the 2GB standard. Otherwise, the Jetson freezes once memory paging leads to swap death.
Anyway, since the objective is remote human viewing, (while providing depth information for the robot to use), the next step will require some more tests, to find a suitable option.
The main blocker is the power usage limitations on the Jetson Nano. I can’t seem to run Wifi and the camera at the same time. According to the tegrastats utility, the POM_5V_IN usage goes over the provided 4A, under basic usage. There are notes saying that 3A can be provided to 2 of the 5V GPIO pins, in order to get 6A total input. That might end up being necessary.
Initial investigation into serving RTSP resulted in inferior, and compressed results compared to a simple python server streaming image by image. The next investigation will be into WebRTC options, which are supposedly the current state of the art, for browser based video streaming. I tried aiortc, and momo, so far, both failed on the Nano.
I’ve decided to try on the Xavier NX, too, just to replicate the experiment, and see how things change. The Xavier has some higher wattage settings, and the wifi is internal, so worth a try. Also, upgraded to Jetpack 5.0.2, which was a gamble. Thought surely it would be better than upgrading to a 5.0.1 dev preview, but none of their official products support 5.0.2 yet, so there will likely be much pain involved. On the plus side, python 3.8 is standard, so some libraries are back on the menu.
On the Xavier, we’re getting 80 FPS, compared to 25 FPS on the Nano. Quite an upgrade. Also, able to run wifi and realsense at the same time.
Looks like a success. Getting multiple frames per second with about a second of lag over the network.
A quick post, because I looked into this, and decided it wasn’t a viable option. We’re using RPi Zero W for the simplest robot, and I was thinking that with object detection, and ultrasound sensors for depth, one could approximate the far more complicated Realsense on Jetson option.
QEngineering managed to get 11FPS on classification, on the RPi.
But the simplest object detection, MobileNet SSD on Tensorflow 2 Lite, (supposedly faster than Tiny-YOLO3), appears to be narrowly possible, but it is limited to running inference on a picture, in about 6 or 7 seconds.
There is a Tensorflow Lite Micro, and some people have ported it for RPi Zero, (eg. tflite_micro_runtime) but I wasn’t able to install the pip wheel, and gave up.
This guy may have got it working, though it’s hard to tell. I followed the method for installing tensorflow 2 lite, and managed to corrupt my SD card, with “Structure needs cleaning” errors.
So maybe I try again some day, but it doesn’t look like a good option. The RPi 3 or 4 is a better bet. Some pages mentioned NNPack, which allows the use of multiple cores, for NNs. But since the RPi Zero has a single core, it’s likely that if I got it working, it would only achieve inference on a single image frame in 7 seconds, which isn’t going to cut it.
After reading up on IMUs, you get 3 axes: accelerometer, 6 axes: + gyroscope, 9 axes: + magnetometer. And some 10 axes ones, if it’s fancy, and has a thermometer to correct inaccuracies, etc.
6 axes gives you relative positions, 9 axes gives you absolute positions.
I happen to have a 6 axis one, from Aliexpress, from years ago. Never used it, but now I have a reason. It’s labelled GY-521. Here’s a video tutorial on putting it all together, with the tutorial link for reading.
“the 6-DoF, which was used to determine the linear velocity, angular velocity, position and orientation.” – this paper, “Pose Estimation of a Mobile Robot Based on Fusion of IMU Data and Vision Data Using an Extended Kalman Filter”
You need to take these folders from the github link.
and put them in your Arduino libs folder
The github also has some code for Raspberry Pi, which I might get to next. Badabing badaboom, it actually worked first time. ( I touched the USB cable though, and needed to restart it, but that seems like something that can be prevented).
Ok so accelerometer x/y/z, temperature nice, gyroscope x/y/z
I watched these numbers as I moved it around, and at 9600 baud, it’s really slow. It’s not going to help for real time decision making.
Maybe we’ll come back to IMUs later. A bit complicated to visualise and make sense of the data, but with a visualisation, it would make odometry SLAM methods more robust.
We need the robot to be somewhat intelligent, and that means some rules, or processes based on sensor input. Originally I was doing everything in PyBullet, and that leads to techniques like PyBullet-Planning. But as we’re getting closer to a deadline, simpler ideas are winning out.
I came across this paper, and figures 1 and 2 give an idea.
WordPress apparently won’t show this image in original resolution, but you can look up close in the PDF.
Here’s a fuzzy controller: FUZZY LOGIC CONTROL FOR ROBOT MAZE TRAVERSAL: AN UNDERGRADUATE CASE STUDY
I like that idea too. Fuzzifying concepts, defuzzifying into adjusting towards descriptions from rules
This is probably starting to get too many acronyms for me, but Goal Reasoning is a pretty fascinating topic. Now that I see it, of course NASA has been working on this stuff for ages.
Excerpted below (Doychev, 2021):
The ”C” Language Production System (CLIPS) is a portable, rule-based production system [Wyg89]. It was first developed for NASA as an expert system and uses forward chaining inference based on the Rete algorithm consisting of three building blocks[JCG]:
Fact List: The global memory of the agent. It is used as a container to store basic pieces of information about the world in the form of facts, which are usually of specific types. The fact list is constantly updated using the knowledge in the knowledge base.
Knowledge Base: It comprises heuristic knowledge in two forms:
• Procedural Knowledge: An operation that leads to a certain effect. These can, for example, modify the fact base. Functions carry procedural knowledge and can also be implemented in C++. They are mainly used for the utilization of the agent, such as communication to a behavior engine. An example for procedural knowledge would be a function that calls a robot-arm driver to grasp at a target location, or a fact base update reflecting a sensor reading.
• Rules: Rules play an important role in CLIPS. They can be compared to IF-THEN statements in procedural languages. They consist of several preconditions, that need to be satisfied by the current fact list for the rule to be activated, and effects in the form of procedural knowledge. When all its preconditions are satisfied, a rule is added to the agenda, which executes all the activated rules subsequently by firing the corresponding procedural knowledge.
Inference Engine: The main controlling block. It decides which rules should be executed and when. Based on the knowledge base and the fact base, it guides the execution of agenda and rules, and updates the fact base, if needed. This is performed until a stable state is reached, meaning, there are no more activated rules. The inference engine supports different conflict resolution strategies, as multiple rules can be active at a time. For example, rules are ordered by their salience, a numeric value where a higher value means higher priority. If rules with the same salience are active at a time, they are executed in the order of their activation.
CLIPS Executive The CLIPS Executive (CX) is a CLIPS-based production system which serves as a high-level controller, managing all the high-level decision making. Its main tasks involve goal formation, goal reasoning, on-demand planner invocation, goal execution and monitoring, world and agent memory (a shared database for multiple agents) information synchronization. In general, this is achieved by individual CLIPS structures (predefined facts, rules, etc.), that get added to the CLIPS environment.
It’s the Rete algorithm, so it’s a rule engine. It’s a cool algorithm. If you don’t know about rule engines, they are what you use when you start to have more than 50 ‘if’ statements.
Ok, that’s all I need to know. I’ve used KIE professionally. I don’t want to use Java in this project. There appear to be some simplified Python Rule Engines, and so I’ll check them out, when I have some sensor input.
I think I’m going to try this one. They snagged ‘rule-engine’ at PyPi, so they must be good.
Ok, I’ve set up an Arduino with three ultrasonic distance sensors, and it’s connected to the Raspberry Pi. I should do a write-up on that. So I can poll the Arduino and get ‘forward left’, ‘forward’ and ‘forward right’ ultrasonic sensor distance back, in JSON.
I think for now, a good start would be having consequences of, forward, backwards, left, right, and stand still.
These are categories of motions. Motions have names, so far, so we will just categorize motions by their name, by whether they contain one of these cardinal motions (forward or walk, back, left, right, still) in their name.
To keep things interesting, the robot can pick motions from these categories at random. I was thinking of making scripts, to join together motions, but I’m going to come back to that later. Scripts would just be sequences of motions, so it’s not strictly necessary, since we’re going to use a rule engine now.
Ok… after thinking about it, screw the rule engine. We only need like 10 rules, at the moment, and I just need to prioritize the rules, and reevaluate often. I’ll just write them out, with a basic prioritisation.
I also see an interesting hybrid ML / Rule option from sci-kit learn.
Anyway, too complicated for now. So this would all be in a loop.
TOO_CLOSE=30
priority = 0
# High Priority '0'
if F < TOO_CLOSE:
runMotionFromCategory("BACK")
priority = 1
if L < TOO_CLOSE:
runMotionFromCategory("BACK")
runMotionFromCategory("RIGHT")
priority = 1
if R < TOO_CLOSE:
runMotionFromCategory("BACK")
runMotionFromCategory("LEFT")
priority = 1
# Medium Priority '1' (priority is still 0)
if priority == 0 and L < R and L < F:
runMotionFromCategory("RIGHT")
priority = 2
if priority == 0 and R < L and R < F:
runMotionFromCategory("LEFT")
priority = 2
# Low Priority '2' (priority is still 0)
if priority == 0 and L < F and R < F:
runMotionFromCategory("WALK")
priority = 3
Good enough. So I still want this to be part of the UI though, and so the threading, and being able to exit the loop will be important.
Basically, the problem is, how to implement a stop button, in HTTP / Flask. Need a global variable, basically, which can be changed. But a global variable in a web app? Sounds like a bad idea. We have session variables, but the thread that’s running the motion is in the past, and probably evaluating a different session map. Maybe not though. Will need to research and try a few things.
Yep… “Flask provides you with a special object that ensures it is only valid for the active request and that will return different values for each request.”
Ok, Flask has ‘g’ …?
from flask import g
user = getattr(flask.g, 'user', None)
user = flask.g.get('user', None)
Hmm ok that’s not right. It’s not sharable between requests, apparently. There’s Flask-Cache…. that might work? Ok I think this is it maybe.
Now, to run the brain. I don’t know how to reuse code with these inner functions. So it’s copy-paste time. The brain will need to be a single thread. So something like
@app.route('/runBrain', methods=['POST', 'GET'])
def runBrain():
@copy_current_request_context
def runBrainTask(data):
@copy_current_request_context
def runMotionFromCategory(category):
...
...
if x then runMotionFromCategory("BACK")
if y then runMotionFromCategory("LEFT")
(start runBrainTask thread)
Ok let’s try it…
Ok first need to fix the motions.
Alright. It’s working! Basic Planning working.
It can shuffle around now, without bumping into walls. Further testing will be needed. But a good basis for a robot.
Though I’m generally using stable baseline algorithms for training locomotion tasks, I am sometimes drawn back to evolutionary algorithms, and especially Map Elites, which has now been upgraded to incorporate a policy gradient.
The archiving of behaviours is what attracts me to Map Elites.
PGA Map Elites based on top of QDGym, which tracks Quality Diversity, is probably worth a look.
I’ve now got a UNet that can provide predictions for where an egg is, in simulation.
So I want to design a reward function related to the egg prediction mask.
I haven’t ‘plugged in’ the trained neural network though, because it will slow things down, and I can just as well make use of the built-in pybullet segmentation to get the simulation egg pixels. At some point though, the robot will have to exist in a world where egg pixels are not labelled as such, and the simulation trained vision will be a useful basis for training.
I think a good reward function might be, (to not fall over), and to maximize the number of 1s for the egg prediction mask. An intermediate award might be the centering of egg pixels.
The numpy way to count mask pixels could be
arr = np.array([1, 0, 0, 0, 0, 1, 1, 1, 1, 0])
np.count_nonzero(arr == 1)
I ended up using the following to count the pixels:
seg = Image.fromarray(mask.astype('uint8'))
self._num_ones = (np.array(seg) == 1).sum()
Hmm for centering, not sure yet.
I’m looking into how to run pybullet / gym on the cloud and get some of it rendering.
I’ve found a few leads. VNC is an obvious solution, but probably won’t be available on Chrome OS. Pybullet has a broken link, but I think it’s suggesting something like this colab, more or less, using ‘pyrender’. User matpalm has a minimal example of sending images to Google Dataflow. Those might be good if I can render video. There’s a Jupyter example with capturing images in pybullet. I’ll have to research a bit more. An RDP viewer would probably be easiest, if it’s possible.
I set up the Ray Tune training again, on google cloud, and enabled the dashboard by opening some ports (8265, and 6006), and initialising ray with ray.init(dashboard_host=”0.0.0.0″)
I can see it improving the episode reward mean, but it’s taking a good while on the 4 CPU cloud machine. Cost is about $3.50/day on the CPU machine, and about $16/day on the GPU machine. Google is out of T4 GPUs at the moment.
I have it saving the occasional mp4 video using a Monitor wrapper that records every 10th episode.
Continuing from our early notes on SLAM algorithms (Simultaneous Localisation and Mapping), and the similar but not as map-making, DSO algorithm, I came across a good project (“From cups to consciousness“) and article that reminded me that mapping the environment or at least having some sense of depth, will be pretty crucial.
At the moment I’ve just got to the point of thinking to train a CNN on simulation data, and so there should also be some positioning of the robot as a model in it’s own virtual world. So it’s probably best to reexamine what’s already out there. Visual odometry. Optical Flow.
I found a good paper summarizing 2019 options. The author’s github has some interesting scripts that might be useful. It reminds me that I should probably be using ROS and gazebo, to some extent. The conclusion was roughly that Google Cartographer or GMapping (Open SLAM) are generally beating some other ones, Karto, Hector. Seems like SLAM code is all a few years old. Google Cartographer had some support for ‘lifelong mapping‘, which sounded interesting. The robot goes around updating its map, a bit. It reminds me I saw ‘PonderNet‘ today, fresh from DeepMind, which from a quick look is, more or less, about like scaling your workload down to your input size.
Anyway, we are mostly interested in Monocular SLAM. So none of this applies, probably. I’m mostly interested at the moment, in using some prefab scenes like the AI2Thor environment in the Cups-RL example, and making some sort of SLAM in simulation.
Also interesting is RATSLAM and the recent update: LatentSLAM – The authors of this site, The Smart Robot, got my attention because of the CCNs. Cortical column networks.
“A common shortcoming of RatSLAM is its sensitivity to perceptual aliasing, in part due to the reliance on an engineered visual processing pipeline. We aim to reduce the effects of perceptual aliasing by replacing the perception module by a learned dynamics model. We create a generative model that is able to encode sensory observations into a latent code that can be used as a replacement to the visual input of the RatSLAM system”
Interesting, “The robot performed 1,143 delivery tasks to 11 different locations with only one delivery failure (from which it recovered), traveled a total distance of more than 40 km over 37 hours of active operation, and recharged autonomously a total of 23 times.“
I think DSO might be a good option, or the closed loop, LDSO, look like the most straight-forward, maybe.
After a weekend away with a computer vision professional, I found out about COLMAP, a structure from movement suite.
I saw a few more recent projects too, e.g. NeuralRecon, and
ooh, here’s a recent facebook one that sounds like it might work!
Consistent Depth … eh, their google colab is totally broken.
Anyhow, LDSO. Let’s try it.
In file included from /dmc/LDSO/include/internal/OptimizationBackend/AccumulatedTopHessian.h:10:0, from /dmc/LDSO/include/internal/OptimizationBackend/EnergyFunctional.h:9, from /dmc/LDSO/include/frontend/FeatureMatcher.h:10, from /dmc/LDSO/include/frontend/FullSystem.h:18, from /dmc/LDSO/src/Map.cc:4: /dmc/LDSO/include/internal/OptimizationBackend/MatrixAccumulators.h:8:10: fatal error: SSE2NEON.h: No such file or directory #include "SSE2NEON.h" ^~~~ compilation terminated. src/CMakeFiles/ldso.dir/build.make:182: recipe for target 'src/CMakeFiles/ldso.dir/Map.cc.o' failed make[2]: *** [src/CMakeFiles/ldso.dir/Map.cc.o] Error 1 make[2]: *** Waiting for unfinished jobs…. CMakeFiles/Makefile2:85: recipe for target 'src/CMakeFiles/ldso.dir/all' failed make[1]: *** [src/CMakeFiles/ldso.dir/all] Error 2 Makefile:83: recipe for target 'all' failed make: *** [all] Error 2
Ok maybe not.
There’s a paper here reviewing ORBSLAM3 and LDSO, and they encounter lots of issues. But it’s a good paper for an overview of how the algorithms work. We want a point cloud so we can find the closest points, and not walk into them.
Calibration is an issue, rolling shutter cameras are an issue, IMU data can’t be synced meaningfully, it’s a bit of a mess, really.
Also, reports that ORB-SLAM2 was only getting 5 fps on a raspberry pi, I got smart, and looked for something specifically for the jetson. I found a depth CNN for monocular vision on the forum, amazing.
Ok so after much fussing about, I found just what we need. I had an old copy of jetson-containers, and the slam code was added just 6 months ago. I might want to try the noetic one (ROS2) instead of ROS, good old ROS.
git clone https://github.com/dusty-nv/jetson-containers.git
cd jetson-containers
chicken@jetson:~/jetson-containers$ ./scripts/docker_build_ros.sh --distro melodic --with-slam
Successfully built 2eb4d9c158b0
Successfully tagged ros:melodic-ros-base-l4t-r32.5.0
chicken@jetson:~/jetson-containers$ ./scripts/docker_test_ros.sh melodic
reading L4T version from /etc/nv_tegra_release
L4T BSP Version: L4T R32.5.0
l4t-base image: nvcr.io/nvidia/l4t-base:r32.5.0
testing container ros:melodic-ros-base-l4t-r32.5.0 => ros_version
xhost: unable to open display ""
xauth: file /tmp/.docker.xauth does not exist
sourcing /opt/ros/melodic/setup.bash
ROS_ROOT /opt/ros/melodic/share/ros
ROS_DISTRO melodic
getting ROS version -
melodic
done testing container ros:melodic-ros-base-l4t-r32.5.0 => ros_version
Well other than the X display, looking good.
Maybe I should just plug in a monitor. Ideally I wouldn’t have to, though. I used GStreamer the other time. Maybe we do that again.
This looks good too… https://github.com/dusty-nv/ros_deep_learning but let’s stay focused. I’m also thinking maybe we upgrade early, to noetic. Ugh it looks like a whole new bunch of build tools and things to relearn. I’m sure it’s amazing. Let’s do ROS1, for now.
Let’s try build that FCNN one again.
CMake Error at tx2_fcnn_node/Thirdparty/fcrn-inference/CMakeLists.txt:121 (find_package):
By not providing "FindOpenCV.cmake" in CMAKE_MODULE_PATH this project has
asked CMake to find a package configuration file provided by "OpenCV", but
CMake did not find one.
Could not find a package configuration file provided by "OpenCV" (requested
version 3.0.0) with any of the following names:
OpenCVConfig.cmake
opencv-config.cmake
Add the installation prefix of "OpenCV" to CMAKE_PREFIX_PATH or set
"OpenCV_DIR" to a directory containing one of the above files. If "OpenCV"
provides a separate development package or SDK, be sure it has been
installed.
-- Configuring incomplete, errors occurred!
Ok hold on…
Builds additional container with VSLAM packages,
including ORBSLAM2, RTABMAP, ZED, and Realsense.
This only applies to foxy and galactic and implies
--with-pytorch as these containers use PyTorch."
Ok that hangs when it starts building the slam bits. Luckily, someone’s raised the bug, and though it’s not fixed, Dusty does have a docker already compiled.
So, after some digging, I think we can solve the X problem (i.e. where are we going to see this alleged SLAMming occur?) with an RTSP server. Previously I used GStreamer to send RTP over UDP. But this makes more sense, to run a server on the Jetson. There’s a plugin for GStreamer, so I’m trying to get the ‘dev’ version, so I can compile the test-launch.c program.
apt-get install libgstrtspserver-1.0-dev
Reading package lists... Done
Building dependency tree
Reading state information... Done
libgstrtspserver-1.0-dev is already the newest version (1.14.5-0ubuntu1~18.04.1).
ok... git clone https://github.com/GStreamer/gst-rtsp-server.git
root@jetson:/opt/gst-rtsp-server/examples# gcc test-launch.c -o test-launch $(pkg-config --cflags --libs gstreamer-1.0 gstreamer-rtsp-server-1.0)
test-launch.c: In function ‘main’:
test-launch.c:77:3: warning: implicit declaration of function ‘gst_rtsp_media_factory_set_enable_rtcp’; did you mean ‘gst_rtsp_media_factory_set_latency’? [-Wimplicit-function-declaration]
gst_rtsp_media_factory_set_enable_rtcp (factory, !disable_rtcp);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
gst_rtsp_media_factory_set_latency
/tmp/ccC1QgPA.o: In function `main':
test-launch.c:(.text+0x154): undefined reference to `gst_rtsp_media_factory_set_enable_rtcp'
collect2: error: ld returned 1 exit status
gst_rtsp_media_factory_set_enable_rtcp
Ok wait let’s reinstall gstreamer.
apt-get install libgstreamer1.0-dev libgstreamer-plugins-base1.0-dev libgstreamer-plugins-bad1.0-dev gstreamer1.0-plugins-base gstreamer1.0-plugins-good gstreamer1.0-plugins-bad gstreamer1.0-plugins-ugly gstreamer1.0-libav gstreamer1.0-doc gstreamer1.0-tools gstreamer1.0-x gstreamer1.0-alsa gstreamer1.0-gl gstreamer1.0-gtk3 gstreamer1.0-qt5 gstreamer1.0-pulseaudio
error...
Unpacking libgstreamer-plugins-bad1.0-dev:arm64 (1.14.5-0ubuntu1~18.04.1) ...
Errors were encountered while processing:
/tmp/apt-dpkg-install-Ec7eDq/62-libopencv-dev_3.2.0+dfsg-4ubuntu0.1_arm64.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)
Ok then leave out that one...
apt --fix-broken install
and that fails on
Errors were encountered while processing:
/var/cache/apt/archives/libopencv-dev_3.2.0+dfsg-4ubuntu0.1_arm64.deb
It’s like a sign of being a good programmer, to solve this stuff. But damn. Every time. Suggestions continue, in the forums of those who came before. Let’s reload the docker.
Ok I took a break and got lucky. The test-launch.c code is different from what the admin had.
Let’s diff it and see what changed…
#define DEFAULT_DISABLE_RTCP FALSE
from
static gboolean disable_rtcp = DEFAULT_DISABLE_RTCP;
{"disable-rtcp", '\0', 0, G_OPTION_ARG_NONE, &disable_rtcp,
"Whether RTCP should be disabled (default false)", NULL},
from
gst_rtsp_media_factory_set_enable_rtcp (factory, !disable_rtcp);
so now this works (to compile).
gcc test.c -o test $(pkg-config --cflags --libs gstreamer-1.0 gstreamer-rtsp-server-1.0)
So apparently now I can run this in VLC… when I open
rtsp://<jetson-ip>:8554/test
Um is that meant to happen?…. Yes!
Ok next, we want to see SLAM stuff happening. So, ideally, a video feed of the desktop, or something like that.
So hereare the links I have open. Maybe I get back to them later. Need to get back to ORBSLAM2 first, and see where we’re at, and what we need. Not quite /dev/video0 to PC client. More like, ORBSLAM2 to dev/video0 to PC client. Or full screen desktop. One way or another.
libgstrtspserver-1.0-dev is already the newest version (1.14.5-0ubuntu1~18.04.1).
Today we have
E: Unable to locate package libgstrtspserver-1.0-dev E: Couldn't find any package by glob 'libgstrtspserver-1.0-dev' E: Couldn't find any package by regex 'libgstrtspserver-1.0-dev'
Did I maybe compile it outside of the docker? Hmm maybe. Why can’t I find it though? Let’s try the obvious… but also why does this take so long? Network is unreachable. Network is unreachable. Where have all the mirrors gone?
apt-get update
Ok so long story short, I made another docker file. to get gstreamer installed. It mostly required adding a key for the kitware apt repo.
Since 1.14, the use of libv4l2 has been disabled due to major bugs in the emulation layer. To enable usage of this library, set the environment variable GST_V4L2_USE_LIBV4L2=1
but it doesn’t want to work anyway. Ok RTSP is almost a dead end.
I might attach a CSI camera instead of V4L2 (USB camera) maybe. Seems less troublesome. But yeah let’s take a break. Let’s get back to depthnet and ROS2 and ORB-SLAM2, etc.
depthnet: error while loading shared libraries: /usr/lib/aarch64-linux-gnu/libnvinfer.so.8: file too short
Ok, let’s try ROS2.
(Sorry, this was supposed to be about SLAM, right?)
As a follow-up for this post…
I asked about mapping two argus (NVIDIA’s CSI camera driver) node topics, in order to fool their stereo_proc, on the github issues. No replies, cause they probably want to sell expensive stereo cameras, and I am asking how to do it with $15 Chinese cameras.
I looked at DustyNV’s Mono depth. Probably not going to work. It seems like you can get a good depth estimate for things in the scene, but everything around the edges reads as ‘close’. Not sure that’s practical enough for depth.
I looked at the NVIDIA DNN depth. Needs proper stereo cameras.
I looked at NVIDIA VPI Stereo Disparity pipeline It is the most promising yet, but the input either needs to come from calibrated cameras, or needs to be rectified on-the-fly using OpenCV. This seems like it might be possible in python, but it is not obvious yet how to do it in C++, which the rest of the code is in.
I tried calibration.
I removed the USB cameras.
I attached two RPi 2.1 CSI cameras, from older projects. Deep dived into ISAAC_ROS suite. Left ROS2 alone for a bit because it is just getting in the way. The one camera sensor had fuzzy lines going across, horizontally, occasionally, and calibration results were poor, fuzzy. Decided I needed new cameras.
IMX-219 was used by the github author, and I even printed out half of the holder, to hold the cameras 8cm apart.
I tried calibration using the ROS2 cameracalibrator, which is a wrapper for a opencv call, after starting up the camera driver node, inside the isaac ros docker.
(Because of bug, also sometimes need to remove –ros-args –remap )
OpenCV was able to calibrate, via the ROS2 application, in both cases. So maybe I should just grab the outputs from that. We’ll do that again, now. But I think I need to print out a chessboard and just see how that goes first.
I couldn’t get more than a couple of matches using pictures of the chessboard on the screen, even with binary thresholding, in the author’s calibration notebooks.
Here’s what the NVIDIA VPI 1.2’s samples drew, for my chess boards:
Camera calibration seems to be a serious problem, in the IOT camera world. I want something approximating depth, and it is turning out that there’s some math involved.
Learning about epipolar geometry was not something I planned to do for this.
But this is like a major showstopper, so either, I must rectify, in real time, or I must calibrate.
“The reason for the noisy result is that the VPI algorithm expects the rectified image pairs as input. Please do the rectification first and then feed the rectified images into the stereo disparity estimator.”
So can we use this info? The nvidia post references this code below as the solution, perhaps, within the context of the code below. Let’s run it on the chessboard?
“They observed that many quadrupedal, mammalian animals feature a distinguished functional three-segment front leg and hind leg design, and proposed a “pantograph” leg abstraction for robotic research.”
1 DOF (degree of freedom). 1 motor. Miranda wants jointed legs, and I don’t want to work out inverse kinematics, so this looks ideal. Maybe a bit complicated still.
The simpler force diagram:
Compliance is a feature, made possible by springs typically.
A homemade attempt here with the Mojo robot of the Totally Not Evil Robot Army. Their robot only uses 9g servos, and can’t quite pick itself up.
I did an initial design with what I had around, and it turns out compliance is a delicate balance. Too much spring, and it just mangles itself up. Too little spring and it can’t lift off the ground.
Further iterations removed the springs, which were too tight by far, and used cable ties to straighten the legs, but the weight of the robot is a little bit too much for the knee joints.
I will likely leave it until I have a 3d printer, some better springs, and will give it another try with more tools and materials available. Maybe even hydraulics, some day,