After changing the normalisation code, (because the resampled 16000Hz audio was too soft), we have recordings from the microphone being classified by yamnet on the jetson.
Pretty cool. It’s detecting chicken sounds. I had to renormalize the recording volume between -1 and 1, as everything was originally detected as ‘Silence’.
Currently I’m saving 5 second wav files and processing them in Jupyter. But it’s not really interactive in a real-time way, and it would need further training to detect distress, or other, more useful metrics.
We’re unlikely to have time to implement the transfer learning, to continue with the chicken stress vocalisation work, for this project, but it definitely looks like the way to go about it.
There are also some papers that used the VGG-11 architecture for this purpose, chopping up recordings into overlapping 1 second segments, for training, matching them to a label (stressed / not stressed). Note: If downloading the dataset, use the G-Drive link, not the figshare link, which is truncated.
After following the installation procedure for my ‘Respeaker 2-Mic hat’, I’ve set up a dockerfile with TF2 and the audio libs, including librosa, in order to try out this real time version. Getting this right was a real pain, because of breaking changes in the ‘numba’ package.
FROM nvcr.io/nvidia/l4t-tensorflow:r32.6.1-tf2.5-py3
RUN apt-get update && apt-get install -y curl build-essential
RUN apt-get update && apt-get install -y libffi6 libffi-dev
RUN pip3 install -U Cython
RUN pip3 install -U pillow
RUN pip3 install -U numpy
RUN pip3 install -U scipy
RUN pip3 install -U matplotlib
RUN pip3 install -U PyWavelets
RUN pip3 install -U kiwisolver
RUN apt-get update && \
apt-get install -y --no-install-recommends \
alsa-base \
libasound2-dev \
alsa-utils \
portaudio19-dev \
libsndfile1 \
unzip \
&& rm -rf /var/lib/apt/lists/* \
&& apt-get clean
RUN pip3 install soundfile pyaudio wave
RUN pip3 install tensorflow_hub
RUN pip3 install packaging
RUN pip3 install pyzmq==17.0.0
RUN pip3 install jupyterlab
RUN apt-get update && apt-get install -y libblas-dev \
liblapack-dev \
libatlas-base-dev \
gfortran \
protobuf-compiler \
libprotoc-dev \
llvm-9 \
llvm-9-dev
RUN export LLVM_CONFIG=/usr/lib/llvm-9/bin/llvm-config && pip3 install llvmlite==0.36.0
RUN pip3 install --upgrade pip
RUN python3 -m pip install --user -U numba==0.53.1
RUN python3 -m pip install --user -U librosa==0.9.2
#otherwise matplotlib can't draw to gui
RUN apt-get update && apt-get install -y python3-tk
RUN jupyter lab --generate-config
RUN python3 -c "from notebook.auth.security import set_password; set_password('nvidia', '/root/.jupyter/jupyter_notebook_config.json')"
EXPOSE 6006
EXPOSE 8888
CMD /bin/bash -c "jupyter lab --ip 0.0.0.0 --port 8888 --allow-root &> /var/log/jupyter.log" & \
echo "allow 10 sec for JupyterLab to start @ http://$(hostname -I | cut -d' ' -f1):8888 (password nvidia)" && \
echo "JupterLab logging location: /var/log/jupyter.log (inside the container)" && \
/bin/bash
I'm running it with
sudo docker run -it --rm --runtime nvidia --network host --privileged=true -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix -v /home/chicken/:/home/chicken nano_tf2_yamnet
NLP. Disambiguation: Not neurolinguistic programming, a discredited therapy, whereby one reprograms themselves, in a sense, to change their ill behaviours.
No, this one is about processing text. Sequence to sequence. Big models. Transformers. Just wondering now, if we need it. I saw an ad of some sort, for ‘Tagalog‘ and thought it was something about the Filipino language, but it was a product for NLP shops, for stuff like OCR, labelling, annotating, stuff. For mining text, basically.
Natural language processing is fascinating stuff, but probably out of the scope of this project. GPT-3 is the big one, and I think Google just released a network an order of magnitude bigger?
“Google’s new trillion-parameter AI language model is almost 6 times bigger than GPT-3”, so GPT-3 is old news anyway. I watched some interviews with the AI, and it’s very plucky. I can see why Elon is worried.
We’ve gone a totally different way, but this is another interesting project from Erwin Coumans, on the Google Brain team, who did PyBullet. NeuralSim replaces parts of physics engines with neural networks.
I just found this github from ETH Z. Not surprising that they have some of the most relevant datasets I’ve seen, pertaining to making proprioceptive autonomous systems. I came across their Autonomous Systems Labs dataset site.
One of the projects, panoptic mapping, is pretty much the panoptic segmentation from earlier research, combined with volumetric point clouds. “A flexible submap-based framework towards spatio-temporally consistent volumetric mapping and scene understanding.”
This is turning into one of the most oddly complicated sub-tasks of making a robot.
Turns out you need an intuition for epipolar geometry, to understand what it’s trying to do.
I’ve been battling this for weeks or months, on-and-off, and I must get it working soon. This week. But I know a week won’t be enough. This shit is hairy. Here’s eleven tips on StackOverflow. And here’s twelve tips from another guru.
The only articles that look like they got it working, have printed out these A2 sized chessboards. It’s ridiculous. Children in Africa don’t have A2 sized chessboard printouts!
Side notes: I’d like to mention that there also seem to be some interesting developments, in the direction of not needing perfect gigantic chessboards to calibrate your cameras, which in turn led down a rabbit hole, into a galaxy of projects packaged together with their own robotics software philosophies, and so on. Specifically, I found this OpenHSML project. They packaged their project using PID framework. This apparently standardises the build process, in some way. Clicking on the links to officially released frameworks using PID, leads to RPC, ethercatcpp, hardio, RKCL, and a whole world of sensor fusion topics, including a recent IEEE conference on MultiSensor Fusion and Integration (MFI2021.org).
So, back to chessboards. It’s clearly more of an art than a science.
Let’s, for the sake of argument, use the code at this OpenCV URL on the topic.
Let’s fire up the engines. I really want to close some R&D tabs, but we’re down to about 30. I’ll try reduce, to 20, while we go through the journey, together, dear internet reader, or future intelligence. Step One. Install OpenCV. I’m using the ISAAC ROS common docker with various modifications. (Eg. installing matplotlib, Jupyter Lab)
cd ~/your_ws/src/isaac_ros_common
./scripts/run_dev.sh ~/your_ws/
python3
print(cv2.version) 4.5.0
So this version of the docs should work, surely. Let’s start up Jupyter because it looks like it wants it.
python3 example_1.py
example_1.py:61: UserWarning: Matplotlib is currently using agg, which is a non-GUI backend, so cannot show the figure. plt.show()
Firing up Jupyter, inside the docker and check out http://chicken:8888/lab
admin@chicken:/workspaces$ jupyter lab --ip 0.0.0.0 --port 8888 --allow-root &
Let's run the code... let's work out how to make the images bigger, ok, plt.rcParams["figure.figsize"] = (30,20)
It’s definitely doing something. I see the epilines, they look the same. That looks great. Ok next tutorial.
So, StereoBM. “Depth Map from Stereo Images”
so, the top one is the disparity, viewed with StereoBM. Interesting. BM apparently means Block Matching. Interesting to see that the second run of the epiline code earlier, is changing behaviour. Different parallax estimation?
Changing the max disparities, and block size changes the image, kinda like playing with a photoshop filter. Anyway we’re here to calibrate. What next?
I think the winner for nicest website on the topic goes to Andreas Jakl here, professor.
“Here, we’ll use the traditional SIFT algorithm. Its patent expired in March 2020, and the algorithm got included in the main OpenCV implementation.”
Ok first things first, I’m tightening the screws into the camera holder plastic. We’re only doing this a few times.
Let’s run the capture program one more time for fun.
I edited the code to make the images 1280×720, which was one of the outputs of the camera. It’s apparently 16:9.
I took a bunch of left and right images. Ran it through the jetson-stereo-depth/calib/01_intrinsics_lens_dist.ipynb code, and the only chessboards it found were the smaller boards .
So let’s put back in the resizing. Yeah no. Ok. Got like 1 match out of 35.
Ok I’m giving up for now. But we’re printing, boys and girls. We’ll try A4 first, and see how it goes. Nope. A3. Professional print.
Nope. Nicely printed doesn’t matter.
Ran another 4 rounds of my own calibration code. Hundreds of pictures. You only need one match.
Nope. You can read more about messing around with failed calibration attempts in this section.
Asked NVIDIA – is it possible? Two monocular cameras into a stereo camera? – no reply. Two months later, NVIDIA replied. “problem will be time synchronizing their frames. You’ll also need to introduce the baseline into the camera_info of the right camera to make it work”
Time is running out. So I threw money at this problem.
Bought the Intel Realsense D455. At first glance, the programs that come with it are very cool, ‘realsense-viewer’. A bit freaky.
I think my CSC 369 class at Cal Poly was a good model, for a type of autonomous course design. There were only 9 students, and there was a presentation of each person’s project, near the end of the course. What made it a bit more interesting, is that the code was shared at the end, and some students incorporated other students’ code into their programs.
There was some remixing. Because we all worked on different things, (I made ‘Gnapster’, a P2P file sharing app with firewall circumvention, while others worked on MD5 encryption, caching and multiple source downloading.)
My thought was that the development of the GGR robots has been towards something simple and accessible, in order to be sure we get *something* working, in time. I have prototypes, and a UI now, and I add features to the robot app, building it up into something workable. Making small improvements.
Despite a couple years now, of playing with AI and ML and CV and robots, etc., I’m still way behind the curve. The project budget is my salary for maybe three months of work in industry (Software engineering). In terms of R&D, we’re way past three months of work. But the scope and pace of the fields, of AI, ML, CV, robotics, are vast, and fast.
“In theory, theory and practice are the same.In practice, they are not.” – Albert Einstein
Entire months down a rabbit hole can end with a negative result. Time flies. I felt I needed to start aiming lower, at some point, when the reality hit. Miranda and I are both good at time crunches. So I know we’ll have something presentable, as we get close to the deadline. But there’s only so much one can get done, in a certain amount of time, given the constraints of life. Work, responsibility, friends, family, health., etc.
So my thought was more or less, platforming, open source style, for the sake of remixing. Course design based on remixing the contributions of all the participants, who start with the result of previous iterations.
A major issue with this idea, is version control, because if the remix is done like a software engineering project, merging contributions will bite students. There was always a rush, at some companies I worked at, to get your code in first, to avoid being the one who had to update and merge before committing their code.
There’s ways around this issue though. Mob programming with an expert at the helm is probably the best way to merge code.
So, idea was, like a course or workshop, where people learn to do something – and then they all work on different, individual improvements, on that something, and then the merged improvements become the new prototype, upon which future coursework might be built.
Sometimes, you might just want to scrap the janky mess it’s becoming, and use a different paradigm, entirely. But sometimes you don’t have time for a more elegant paradigm.
After reading up on IMUs, you get 3 axes: accelerometer, 6 axes: + gyroscope, 9 axes: + magnetometer. And some 10 axes ones, if it’s fancy, and has a thermometer to correct inaccuracies, etc.
6 axes gives you relative positions, 9 axes gives you absolute positions.
I happen to have a 6 axis one, from Aliexpress, from years ago. Never used it, but now I have a reason. It’s labelled GY-521. Here’s a video tutorial on putting it all together, with the tutorial link for reading.
“the 6-DoF, which was used to determine the linear velocity, angular velocity, position and orientation.” – this paper, “Pose Estimation of a Mobile Robot Based on Fusion of IMU Data and Vision Data Using an Extended Kalman Filter”
You need to take these folders from the github link.
and put them in your Arduino libs folder
The github also has some code for Raspberry Pi, which I might get to next. Badabing badaboom, it actually worked first time. ( I touched the USB cable though, and needed to restart it, but that seems like something that can be prevented).
Ok so accelerometer x/y/z, temperature nice, gyroscope x/y/z
I watched these numbers as I moved it around, and at 9600 baud, it’s really slow. It’s not going to help for real time decision making.
Maybe we’ll come back to IMUs later. A bit complicated to visualise and make sense of the data, but with a visualisation, it would make odometry SLAM methods more robust.
We need the robot to be somewhat intelligent, and that means some rules, or processes based on sensor input. Originally I was doing everything in PyBullet, and that leads to techniques like PyBullet-Planning. But as we’re getting closer to a deadline, simpler ideas are winning out.
I came across this paper, and figures 1 and 2 give an idea.
WordPress apparently won’t show this image in original resolution, but you can look up close in the PDF.
Here’s a fuzzy controller: FUZZY LOGIC CONTROL FOR ROBOT MAZE TRAVERSAL: AN UNDERGRADUATE CASE STUDY
I like that idea too. Fuzzifying concepts, defuzzifying into adjusting towards descriptions from rules
This is probably starting to get too many acronyms for me, but Goal Reasoning is a pretty fascinating topic. Now that I see it, of course NASA has been working on this stuff for ages.
Excerpted below (Doychev, 2021):
The ”C” Language Production System (CLIPS) is a portable, rule-based production system [Wyg89]. It was first developed for NASA as an expert system and uses forward chaining inference based on the Rete algorithm consisting of three building blocks[JCG]:
Fact List: The global memory of the agent. It is used as a container to store basic pieces of information about the world in the form of facts, which are usually of specific types. The fact list is constantly updated using the knowledge in the knowledge base.
Knowledge Base: It comprises heuristic knowledge in two forms:
• Procedural Knowledge: An operation that leads to a certain effect. These can, for example, modify the fact base. Functions carry procedural knowledge and can also be implemented in C++. They are mainly used for the utilization of the agent, such as communication to a behavior engine. An example for procedural knowledge would be a function that calls a robot-arm driver to grasp at a target location, or a fact base update reflecting a sensor reading.
• Rules: Rules play an important role in CLIPS. They can be compared to IF-THEN statements in procedural languages. They consist of several preconditions, that need to be satisfied by the current fact list for the rule to be activated, and effects in the form of procedural knowledge. When all its preconditions are satisfied, a rule is added to the agenda, which executes all the activated rules subsequently by firing the corresponding procedural knowledge.
Inference Engine: The main controlling block. It decides which rules should be executed and when. Based on the knowledge base and the fact base, it guides the execution of agenda and rules, and updates the fact base, if needed. This is performed until a stable state is reached, meaning, there are no more activated rules. The inference engine supports different conflict resolution strategies, as multiple rules can be active at a time. For example, rules are ordered by their salience, a numeric value where a higher value means higher priority. If rules with the same salience are active at a time, they are executed in the order of their activation.
CLIPS Executive The CLIPS Executive (CX) is a CLIPS-based production system which serves as a high-level controller, managing all the high-level decision making. Its main tasks involve goal formation, goal reasoning, on-demand planner invocation, goal execution and monitoring, world and agent memory (a shared database for multiple agents) information synchronization. In general, this is achieved by individual CLIPS structures (predefined facts, rules, etc.), that get added to the CLIPS environment.
It’s the Rete algorithm, so it’s a rule engine. It’s a cool algorithm. If you don’t know about rule engines, they are what you use when you start to have more than 50 ‘if’ statements.
Ok, that’s all I need to know. I’ve used KIE professionally. I don’t want to use Java in this project. There appear to be some simplified Python Rule Engines, and so I’ll check them out, when I have some sensor input.
I think I’m going to try this one. They snagged ‘rule-engine’ at PyPi, so they must be good.
Ok, I’ve set up an Arduino with three ultrasonic distance sensors, and it’s connected to the Raspberry Pi. I should do a write-up on that. So I can poll the Arduino and get ‘forward left’, ‘forward’ and ‘forward right’ ultrasonic sensor distance back, in JSON.
I think for now, a good start would be having consequences of, forward, backwards, left, right, and stand still.
These are categories of motions. Motions have names, so far, so we will just categorize motions by their name, by whether they contain one of these cardinal motions (forward or walk, back, left, right, still) in their name.
To keep things interesting, the robot can pick motions from these categories at random. I was thinking of making scripts, to join together motions, but I’m going to come back to that later. Scripts would just be sequences of motions, so it’s not strictly necessary, since we’re going to use a rule engine now.
Ok… after thinking about it, screw the rule engine. We only need like 10 rules, at the moment, and I just need to prioritize the rules, and reevaluate often. I’ll just write them out, with a basic prioritisation.
I also see an interesting hybrid ML / Rule option from sci-kit learn.
Anyway, too complicated for now. So this would all be in a loop.
TOO_CLOSE=30
priority = 0
# High Priority '0'
if F < TOO_CLOSE:
runMotionFromCategory("BACK")
priority = 1
if L < TOO_CLOSE:
runMotionFromCategory("BACK")
runMotionFromCategory("RIGHT")
priority = 1
if R < TOO_CLOSE:
runMotionFromCategory("BACK")
runMotionFromCategory("LEFT")
priority = 1
# Medium Priority '1' (priority is still 0)
if priority == 0 and L < R and L < F:
runMotionFromCategory("RIGHT")
priority = 2
if priority == 0 and R < L and R < F:
runMotionFromCategory("LEFT")
priority = 2
# Low Priority '2' (priority is still 0)
if priority == 0 and L < F and R < F:
runMotionFromCategory("WALK")
priority = 3
Good enough. So I still want this to be part of the UI though, and so the threading, and being able to exit the loop will be important.
Basically, the problem is, how to implement a stop button, in HTTP / Flask. Need a global variable, basically, which can be changed. But a global variable in a web app? Sounds like a bad idea. We have session variables, but the thread that’s running the motion is in the past, and probably evaluating a different session map. Maybe not though. Will need to research and try a few things.
Yep… “Flask provides you with a special object that ensures it is only valid for the active request and that will return different values for each request.”
Ok, Flask has ‘g’ …?
from flask import g
user = getattr(flask.g, 'user', None)
user = flask.g.get('user', None)
Hmm ok that’s not right. It’s not sharable between requests, apparently. There’s Flask-Cache…. that might work? Ok I think this is it maybe.
Now, to run the brain. I don’t know how to reuse code with these inner functions. So it’s copy-paste time. The brain will need to be a single thread. So something like
@app.route('/runBrain', methods=['POST', 'GET'])
def runBrain():
@copy_current_request_context
def runBrainTask(data):
@copy_current_request_context
def runMotionFromCategory(category):
...
...
if x then runMotionFromCategory("BACK")
if y then runMotionFromCategory("LEFT")
(start runBrainTask thread)
Ok let’s try it…
Ok first need to fix the motions.
Alright. It’s working! Basic Planning working.
It can shuffle around now, without bumping into walls. Further testing will be needed. But a good basis for a robot.