Categories
AI/ML CNNs dev Locomotion OpenCV robots UI Vision

Realsense Depth and TensorRT object detection

A seemingly straightforward idea for robot control involves using depth, and object detection, to form a rough model of the environment.

After failed attempts to create our own stereo camera using two monocular cameras, we eventually decided to buy a commercial product instead, The Intel depth camera, D455.

After a first round of running a COCO trained MobileSSDv2 object detection network in Tensorflow 2 Lite, on the colour images obtained from the realsense camera, on the Jetson Nano, the results were just barely acceptable (~2 FPS) for a localhost stream, and totally unacceptable (~0.25 FPS) served as JPEG over HTTP, to a browser on the network.

Looking at the options, the only feasible solution was to redo the network using TensorRT, the NVIDIA-specific, quantized (16 bit on the Nano, 8 bit on the NX/AGX) neural network framework. The other solution involved investigating options other than simple JPEG compression over HTTP, such as RTSP and WebRTC.

The difficult part was setting up the environment. We used the NVIDIA detectnet code, adapted to take the realsense camera images as input, and to display the distance to the objects. An outdated example was found at CAVEDU robotics blog/github. Fixed up below.

#!/usr/bin/python3



import jetson_inference
import jetson_utils
import argparse
import sys
import os
import cv2
import re
import numpy as np
import io
import time
import json
import random
import pyrealsense2 as rs
from jetson_inference import detectNet
from jetson_utils import videoSource, videoOutput, logUsage, cudaFromNumpy, cudaAllocMapped, cudaConvertColor

parser = argparse.ArgumentParser(description="Locate objects in a live camera stream using an object detection DNN.",
formatter_class=argparse.RawTextHelpFormatter, epilog=jetson_utils.logUsage())
parser.add_argument("--network", type=str, default="ssd-mobilenet-v2",
help="pre-trained model to load (see below for options)")
parser.add_argument("--threshold", type=float, default=0.5,
help="minimum detection threshold to use")
parser.add_argument("--width", type=int, default=640,
help="set width for image")
parser.add_argument("--height", type=int, default=480,
help="set height for image")
opt = parser.parse_known_args()[0]

# load the object detection network
net = detectNet(opt.network, sys.argv, opt.threshold)

# Configure depth and color streams
pipeline = rs.pipeline()
config = rs.config()
config.enable_stream(rs.stream.depth, opt.width, opt.height, rs.format.z16, 30)
config.enable_stream(rs.stream.color, opt.width, opt.height, rs.format.bgr8, 30)
# Start streaming
pipeline.start(config)


press_key = 0
while (press_key==0):
	# Wait for a coherent pair of frames: depth and color
	frames = pipeline.wait_for_frames()
	depth_frame = frames.get_depth_frame()
	color_frame = frames.get_color_frame()
	if not depth_frame or not color_frame:
		continue
	# Convert images to numpy arrays
	depth_image = np.asanyarray(depth_frame.get_data())
	show_img = np.asanyarray(color_frame.get_data())
	
	# convert to CUDA (cv2 images are numpy arrays, in BGR format)
	bgr_img = cudaFromNumpy(show_img, isBGR=True)
	# convert from BGR -> RGB
	img = cudaAllocMapped(width=bgr_img.width,height=bgr_img.height,format='rgb8')
	cudaConvertColor(bgr_img, img)

	# detect objects in the image (with overlay)
	detections = net.Detect(img)

	for num in range(len(detections)) :
		score = round(detections[num].Confidence,2)
		box_top=int(detections[num].Top)
		box_left=int(detections[num].Left)
		box_bottom=int(detections[num].Bottom)
		box_right=int(detections[num].Right)
		box_center=detections[num].Center
		label_name = net.GetClassDesc(detections[num].ClassID)

		point_distance=0.0
		for i in range (10):
			point_distance = point_distance + depth_frame.get_distance(int(box_center[0]),int(box_center[1]))

		point_distance = np.round(point_distance / 10, 3)
		distance_text = str(point_distance) + 'm'
		cv2.rectangle(show_img,(box_left,box_top),(box_right,box_bottom),(255,0,0),2)
		cv2.line(show_img,
			(int(box_center[0])-10, int(box_center[1])),
			(int(box_center[0]+10), int(box_center[1])),
			(0, 255, 255), 3)
		cv2.line(show_img,
			(int(box_center[0]), int(box_center[1]-10)),
			(int(box_center[0]), int(box_center[1]+10)),
			(0, 255, 255), 3)
		cv2.putText(show_img,
			label_name + ' ' + distance_text,
			(box_left+5,box_top+20),cv2.FONT_HERSHEY_SIMPLEX,0.4,
			(0,255,255),1,cv2.LINE_AA)

	cv2.putText(show_img,
		"{:.0f} FPS".format(net.GetNetworkFPS()),
		(int(opt.width*0.8), int(opt.height*0.1)),
		cv2.FONT_HERSHEY_SIMPLEX,1,
		(0,255,255),2,cv2.LINE_AA)


	display = cv2.resize(show_img,(int(opt.width*1.5),int(opt.height*1.5)))
	cv2.imshow('Detecting...',display)
	keyValue=cv2.waitKey(1)
	if keyValue & 0xFF == ord('q'):
		press_key=1


cv2.destroyAllWindows()
pipeline.stop()

Assuming you have a good cmake version and cuda is available (if nvcc doesn’t work, you need to configure linker paths, check this link)… (if you have a cmake version around ~ 3.22-3.24 or so, you need an older one)… prerequisite sudo apt-get install libssl-dev also required.

The hard part was actually setting up the Realsense python bindings.

Clone the repo…

git clone https://github.com/IntelRealSense/librealsense.git

The trick being, to request the python bindings, and cuda, during the cmake phase. Note that often, none of this works. Some tips include…

sudo apt-get install xorg-dev libglu1-mesa-dev

and changing PYTHON to Python

mkdir build
cd build
cmake ../ -DBUILD_PYTHON_BINDINGS:bool=true -DPYTHON_EXECUTABLE=/usr/bin/python3 -DCMAKE_BUILD_TYPE=release -DBUILD_EXAMPLES=true -DBUILD_GRAPHICAL_EXAMPLES=true -DBUILD_WITH_CUDA:bool=true

The above worked on Jetpack 4.6.1, while the below worked on Jetpack 5.0.2

cmake ../ -DBUILD_PYTHON_BINDINGS:bool=true -DPython_EXECUTABLE=/usr/bin/python3.8 -DCMAKE_BUILD_TYPE=release -DBUILD_EXAMPLES=true -DBUILD_GRAPHICAL_EXAMPLES=true -DBUILD_WITH_CUDA:bool=true -DPYTHON_INCLUDE_DIRS=/usr/include/python3.8 -DPython_LIBRARIES=/usr/lib/aarch64-linux-gnu/libpython3.8.so

(and sudo make install)
Update the python path

export PYTHONPATH=$PYTHONPATH:/usr/local/lib
(or a specific python if you have more than one)
if installed in /usr/lib, change accordingly

Check that the folder is in the correct location (it isn't, after following official instructions).

./usr/local/lib/python3.6/dist-packages/pyrealsense2/

Check that the shared object files (.so) are in the right place: 

chicken@chicken:/usr/local/lib$ ls
cmake       libjetson-inference.so  librealsense2-gl.so.2.50    librealsense2.so.2.50    pkgconfig
libfw.a     libjetson-utils.so      librealsense2-gl.so.2.50.0  librealsense2.so.2.50.0  python2.7
libglfw3.a  librealsense2-gl.so     librealsense2.so            librealsense-file.a      python3.6


If it can't find 'pipeline', it means you need to copy the missing __init__.py file.

sudo cp ./home/chicken/librealsense/wrappers/python/pyrealsense2/__init__.py ./usr/local/lib/python3.6/dist-packages/pyrealsense2/

Some extra things to do, 
sudo cp 99-realsense-libusb.rules  /etc/udev/rules.d/

Eventually, I am able to run the inference on the realsense camera, at an apparent 25 FPS, on the localhost, drawing to an OpenGL window.

I also developed a Dockerfile, for the purpose, which benefits from an updated Pytorch version, but various issues were encountered, making a bare-metal install far simpler, ultimately. Note that building jetson-inference, and Realsense SDK on the Nano require increasing your swap size, beyond the 2GB standard. Otherwise, the Jetson freezes once memory paging leads to swap death.

Anyway, since the objective is remote human viewing, (while providing depth information for the robot to use), the next step will require some more tests, to find a suitable option.

The main blocker is the power usage limitations on the Jetson Nano. I can’t seem to run Wifi and the camera at the same time. According to the tegrastats utility, the POM_5V_IN usage goes over the provided 4A, under basic usage. There are notes saying that 3A can be provided to 2 of the 5V GPIO pins, in order to get 6A total input. That might end up being necessary.

Initial investigation into serving RTSP resulted in inferior, and compressed results compared to a simple python server streaming image by image. The next investigation will be into WebRTC options, which are supposedly the current state of the art, for browser based video streaming. I tried aiortc, and momo, so far, both failed on the Nano.

I’ve decided to try on the Xavier NX, too, just to replicate the experiment, and see how things change. The Xavier has some higher wattage settings, and the wifi is internal, so worth a try. Also, upgraded to Jetpack 5.0.2, which was a gamble. Thought surely it would be better than upgrading to a 5.0.1 dev preview, but none of their official products support 5.0.2 yet, so there will likely be much pain involved. On the plus side, python 3.8 is standard, so some libraries are back on the menu.

On the Xavier, we’re getting 80 FPS, compared to 25 FPS on the Nano. Quite an upgrade. Also, able to run wifi and realsense at the same time.

Looks like a success. Getting multiple frames per second with about a second of lag over the network.

Categories
AI/ML CNNs deep dev ears robots

YAMNet

After changing the normalisation code, (because the resampled 16000Hz audio was too soft), we have recordings from the microphone being classified by yamnet on the jetson.

Pretty cool. It’s detecting chicken sounds. I had to renormalize the recording volume between -1 and 1, as everything was originally detected as ‘Silence’.

Currently I’m saving 5 second wav files and processing them in Jupyter. But it’s not really interactive in a real-time way, and it would need further training to detect distress, or other, more useful metrics.

Training instructions – however, there appears to be a tutorial.

We’re unlikely to have time to implement the transfer learning, to continue with the chicken stress vocalisation work, for this project, but it definitely looks like the way to go about it.

There are also some papers that used the VGG-11 architecture for this purpose, chopping up recordings into overlapping 1 second segments, for training, matching them to a label (stressed / not stressed). Note: If downloading the dataset, use the G-Drive link, not the figshare link, which is truncated.

Let’s try get YAMnet running in real time.

After following the installation procedure for my ‘Respeaker 2-Mic hat’, I’ve set up a dockerfile with TF2 and the audio libs, including librosa, in order to try out this real time version. Getting this right was a real pain, because of breaking changes in the ‘numba’ package.

FROM nvcr.io/nvidia/l4t-tensorflow:r32.6.1-tf2.5-py3

RUN apt-get update && apt-get install -y curl build-essential
RUN apt-get update && apt-get install -y libffi6 libffi-dev

RUN pip3 install -U Cython
RUN pip3 install -U pillow
RUN pip3 install -U numpy
RUN pip3 install -U scipy
RUN pip3 install -U matplotlib
RUN pip3 install -U PyWavelets
RUN pip3 install -U kiwisolver

RUN apt-get update && \
    apt-get install -y --no-install-recommends \
            alsa-base \
            libasound2-dev \
            alsa-utils \
            portaudio19-dev \
            libsndfile1 \
            unzip \
    && rm -rf /var/lib/apt/lists/* \
    && apt-get clean


RUN pip3 install soundfile pyaudio wave
RUN pip3 install tensorflow_hub
RUN pip3 install packaging
RUN pip3 install pyzmq==17.0.0
RUN pip3 install jupyterlab

RUN apt-get update && apt-get install -y libblas-dev \
				      liblapack-dev \
				      libatlas-base-dev \
				      gfortran \
				      protobuf-compiler \
				      libprotoc-dev \
		                      llvm-9 \
				      llvm-9-dev 

RUN export LLVM_CONFIG=/usr/lib/llvm-9/bin/llvm-config && pip3 install llvmlite==0.36.0
RUN pip3 install --upgrade pip
RUN python3 -m pip install --user -U numba==0.53.1
RUN python3 -m pip install --user -U librosa==0.9.2
#otherwise matplotlib can't draw to gui
RUN apt-get update && apt-get install -y python3-tk


RUN jupyter lab --generate-config

RUN python3 -c "from notebook.auth.security import set_password; set_password('nvidia', '/root/.jupyter/jupyter_notebook_config.json')"

EXPOSE 6006
EXPOSE 8888

CMD /bin/bash -c "jupyter lab --ip 0.0.0.0 --port 8888 --allow-root &> /var/log/jupyter.log" & \
        echo "allow 10 sec for JupyterLab to start @ http://$(hostname -I | cut -d' ' -f1):8888 (password nvidia)" && \
        echo "JupterLab logging location:  /var/log/jupyter.log  (inside the container)" && \
        /bin/bash


I'm running it with

sudo docker run -it --rm --runtime nvidia --network host --privileged=true -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix -v /home/chicken/:/home/chicken nano_tf2_yamnet

Categories
AI/ML deep highly_speculative

NLP: Natural Language Processing

NLP. Disambiguation: Not neurolinguistic programming, a discredited therapy, whereby one reprograms themselves, in a sense, to change their ill behaviours.

No, this one is about processing text. Sequence to sequence. Big models. Transformers. Just wondering now, if we need it. I saw an ad of some sort, for ‘Tagalog‘ and thought it was something about the Filipino language, but it was a product for NLP shops, for stuff like OCR, labelling, annotating, stuff. For mining text, basically.

Natural language processing is fascinating stuff, but probably out of the scope of this project. GPT-3 is the big one, and I think Google just released a network an order of magnitude bigger?

Here’s a bit of oddly dystopian exclusivity news, of Microsoft to exclusively license the GPT-3 language model. Amazing.

“Google’s new trillion-parameter AI language model is almost 6 times bigger than GPT-3”, so GPT-3 is old news anyway. I watched some interviews with the AI, and it’s very plucky. I can see why Elon is worried.

I don’t think we’ll need it for chickens.

Categories
AI/ML deep institutes links neuro sim2real simulation

NeuralSim

We’ve gone a totally different way, but this is another interesting project from Erwin Coumans, on the Google Brain team, who did PyBullet. NeuralSim replaces parts of physics engines with neural networks.

https://sites.google.com/usc.edu/neuralsim

https://github.com/google-research/tiny-differentiable-simulator

Categories
3D Research AI/ML deep envs institutes Vision

Panoptic Mapping

I just found this github from ETH Z. Not surprising that they have some of the most relevant datasets I’ve seen, pertaining to making proprioceptive autonomous systems. I came across their Autonomous Systems Labs dataset site.

One of the projects, panoptic mapping, is pretty much the panoptic segmentation from earlier research, combined with volumetric point clouds. “A flexible submap-based framework towards spatio-temporally consistent volumetric mapping and scene understanding.”

Categories
AI/ML Behaviour deep Vision

DeepLabCut

https://github.com/DeepLabCut

The NVIDIA Jetson instructions for a live application are here.

It looks like there’s some very recent papers and articles using this code for pose estimation.

Across-Species Pose Estimation in Poultry Based on Images Using Deep Learning

Study on Poultry Pose Estimation Based on Multi-Parts Detection:

“The automatic estimation of poultry posture can help to analyze the movement, behavior, and even health of poultry”

This paper mentions a few other uses of deep learning for chickens.

Categories
AI/ML Behaviour bio chicken_research control deep dev ears evolution highly_speculative neuro UI

Hierarchical Temporal Memory

Here I’m continuing with the task of unsupervised detection of audio anomalies, hopefully for the purpose of detecting chicken stress vocalisations.

After much fussing around with the old Numenta NuPic codebase, I’m porting the older nupic.audio and nupic.critic code, over to the more recent htm.core.

These are the main parts:

  • Sparse Distributed Representation (SDR)
  • Encoders
  • Spatial Pooler (SP)
  • Temporal Memory (TM)

I’ve come across a very intricate implementation and documentation, about understanding the important parts in the HTM model, way deep, like how did I get here? I will try implement the ‘critic’ code, first. Or rather, I’ll try port it from nupic to htm. After further investigation, there’s a few options, and I’m going to try edit the hotgym example, and try shove wav files frequency band scalars through it instead of power consumption data. I’m simplifying the investigation. But I need to make some progress.

I’m using this docker to get in, mapping my code and wav file folder in:

docker run -d -p 8888:8888 --name jupyter -v /media/chrx/0FEC49A4317DA4DA/sounds/:/home/jovyan/work 3rdman/htm.core-jupyter:latest



So I've got some code working that writes to 'nupic format' (.csv) and code that reads the amplitudes from the csv file, and then runs it through htm.core. 

So it takes a while, and it's just for 1 band (of 10 bands). I see it also uses the first 1/4 of so of the time to know what it's dealing with.  Probably need to run it through twice to get predictive results in the first 1/4. 

Ok no, after a few weeks, I've come back to this point, and realise that the top graph is the important one.  Prediction is what's important.  The bottom graphs are the anomaly scores, used by the prediction.  
Frequency Band 0

The idea in nupic.critic, was to threshold changes in X bands. Let’s see the other graphs…

Frequency band 0: 0-480Hz ?
Frequency band 2: 960-1440Hz ?
Frequency band 3: 1440-1920Hz ?
Frequency band 4: 1920-2400Hz ?
Frequency band 5: 2400-2880Hz ?
Frequency band 6: 2880-3360Hz ?

Ok Frequency bands 7, 8, 9 were all zero amplitude. So that’s the highest the frequencies went. Just gotta check what those frequencies are, again…

Opening 307.wav
Sample width (bytes): 2
Frame rate (sampling frequency): 48000
Number of frames: 20771840
Signal length: 20771840
Seconds: 432
Dimensions of periodogram: 4801 x 2163

Ok with 10 buckets, 4801 would divide into 
Frequency band 0: 0-480Hz
Frequency band 1: 480-960Hz
Frequency band 2: 960-1440Hz
Frequency band 3: 1440-1920Hz
Frequency band 4: 1920-2400Hz
Frequency band 5: 2400-2880Hz
Frequency band 6: 2880-3360Hz

Ok what else. We could try segment the audio by band, so we can narrow in on the relevant frequency range, and then maybe just focus on that smaller range, again, in higher detail.

Learning features with some labeled data, is probably the correct way to do chicken stress vocalisation detections.

Unsupervised anomaly detection might be totally off, in terms of what an anomaly is. It is probably best, to zoom in on the relevant bands and to demonstrate a minimal example of what a stressed chicken sounds like, vs a chilled chicken, and compare the spectrograms to see if there’s a tell-tale visualisable feature.

A score from 1 to 5 for example, is going to be anomalous in arbitrary ways, without labelled data. Maybe the chickens are usually stressed, and the anomalies are when they are unstressed, for example.

A change in timing in music might be defined, in some way. like 4 out of 7 bands exhibiting anomalous amplitudes. But that probably won’t help for this. It’s probably just going to come down to a very narrow band of interest. Possibly pointing it out on a spectrogram that’s zoomed in on the feature, and then feeding the htm with an encoding of that narrow band of relevant data.


I’ll continue here, with some notes on filtering. After much fuss, the sox app (apt-get install sox) does it, sort of. Still working on python version.

                                                                              $ sox 307_0_50.wav filtered_50_0.wav sinc -n 32767 0-480
$ sox 307_0_50.wav filtered_50_1.wav sinc -n 32767 480-960
$ sox 307_0_50.wav filtered_50_2.wav sinc -n 32767 960-1440
$ sox 307_0_50.wav filtered_50_3.wav sinc -n 32767 1440-1920
$ sox 307_0_50.wav filtered_50_4.wav sinc -n 32767 1920-2400
$ sox 307_0_50.wav filtered_50_5.wav sinc -n 32767 2400-2880
$ sox 307_0_50.wav filtered_50_6.wav sinc -n 32767 2880-3360


So, sox does seem to be working.  The mel spectrogram is logarithmic, which is why it looks like this.

Visually, it looks like I'm interested in 2048 to 4096 Hz.  That's where I can see the chirps.

Hmm. So I think the spectrogram is confusing everything.

So where does 4800 come from? 48 kHz. 48,000 Hz (48 kHz) is the sample rate “used for DVDs“.

Ah. Right. The spectrogram values represent buckets of 5 samples each, and the full range is to 24000…?

Sample width (bytes): 2
0.     5.    10.    15.    20.    25.    30.    35.    40.    45.    50.    55.    60.    65.    70.    75.    80.    85.    90.    95.    100.   105.   110.   115.   120.   125.   130.   135.   140.   145.
...
 23950. 23955. 23960. 23965. 23970. 23975. 23980. 23985. 23990. 23995. 24000.]

ok. So 2 x 24000. Maybe 2 channels? Anyway, full range is to 48000Hz. In that case, are the bands actually…

Frequency band 0: 0-4800Hz
Frequency band 1: 4800-9600Hz
Frequency band 2: 9600-14400Hz
Frequency band 3: 14400-19200Hz
Frequency band 4: 19200-24000Hz
Frequency band 5: 24000-28800Hz
Frequency band 6: 28800-33600Hz

Ok so no, it’s half the above because of the sample width of 2.

Frequency band 0: 0-2400Hz
Frequency band 1: 2400-4800Hz
Frequency band 2: 4800-7200Hz
Frequency band 3: 7200-9600Hz
Frequency band 4: 9600-12000Hz
Frequency band 5: 12000-14400Hz
Frequency band 6: 14400-16800Hz

So why is the spectrogram maxing at 8192Hz? Must be spectrogram sampling related.

ol_hann_win
From Berkeley document

So the original signal is 0 to 24000Hz, and the spectrogram must be 8192Hz because… the spectrogram is made some way. I’ll try get back to this when I understand it.

sox 307_0_50.wav filtered_50_0.wav sinc -n 32767 0-2400
sox 307_0_50.wav filtered_50_1.wav sinc -n 32767 2400-4800
sox 307_0_50.wav filtered_50_2.wav sinc -n 32767 4800-7200
sox 307_0_50.wav filtered_50_3.wav sinc -n 32767 7200-9600
sox 307_0_50.wav filtered_50_4.wav sinc -n 32767 9600-12000
sox 307_0_50.wav filtered_50_5.wav sinc -n 32767 12000-14400
sox 307_0_50.wav filtered_50_6.wav sinc -n 32767 14400-16800

Ok I get it now.

Ok i don’t entirely understand the last two. But basically the mel spectrogram is logarithmic, so those high frequencies really don’t get much love on the mel spectrogram graph. Buggy maybe.

But I can estimate now the chirp frequencies…

sox 307_0_50.wav filtered_bird.wav sinc -n 32767 1800-5200

Beautiful. So, now to ‘extract the features’…

So, the nupic.critic code with 1 bucket managed to get something resembling the spectrogram. Ignore the blue.

But it looks like maybe, we can even just threshold and count peaks. That might be it.

sox 307.wav filtered_307.wav sinc -n 32767 1800-5200
sox 3072.wav filtered_3072.wav sinc -n 32767 1800-5200
sox 237.wav filtered_237.wav sinc -n 32767 1800-5200
sox 98.wav filtered_98.wav sinc -n 32767 1800-5200

Let’s do the big files…

Ok looks good enough.

So now I’m plotting the ‘chirp density’ (basically volume).

’98.wav’
‘237.wav’
‘307.wav’
‘3072.wav’

In this scheme, we just proxy chirp volume density as a variable representing stress.  We don’t know if it is a true proxy.
As you can see, some recordings have more variation than others.  

Some heuristic could be decided upon, for rating the stress from 1 to 5.  The heuristic depends on how the program would be used.  For example, if it were streaming audio, for an alert system, it might alert upon some duration of time spent above one standard deviation from the rolling mean. I’m not sure how the program would be used though.

If the goal were to differentiate stressed and not stressed vocalisations, that would require labelled audio data.   

(Also, basically didn’t end up using HTM, lol)

Categories
AI/ML deep dev envs meta

Codex

I applied for the OpenAI Codex beta, and tried it out. It’s very cool.

It’s like GPT trained on Github or something like that, so you can just give it a prompt, and it tries to carry on the sentence with what it expects next.

So the easy way to use it, I’ve found, is to write a comment for the code you want it to write for you. I switched the engine to ‘davinci-codex’ and cranked up the response length to 1600 or so. Sometimes it just makes up more comments, or does something silly. But now and then it hits a home run.

Here’s an example.

"""
Take an example wav file, found at file_name="/home/jovyan/work/307.wav", and plot a mel spectrogram of it. 
""" 

import librosa
import librosa.display
import matplotlib.pyplot as plt

file_name="/home/jovyan/work/307.wav"

y, sr = librosa.load(file_name)

plt.figure(figsize=(12, 4))
librosa.display.waveplot(y, sr=sr)
plt.show()

S = librosa.feature.melspectrogram(y, sr=sr, n_mels=128)
log_S = librosa.power_to_db(S, ref=np.max)
plt.figure(figsize=(12, 4))
librosa.display.specshow(log_S, sr=sr, x_axis='time', y_axis='mel')
plt.title('mel power spectrogram')
plt.colorbar(format='%+02.0f dB')
plt.tight_layout()
plt.show()

I had to install librosa, but then it worked. It did more than I asked for.

Then I fed that whole thing, with another

"""
Plot the MFCC of file_name="/home/jovyan/work/307.wav"
"""

and it carried like an example giving champ:

"""
Take an example wav file, found at file_name="/home/jovyan/work/307.wav", and plot a mel spectrogram of it. 
""" 

import librosa
import librosa.display
import matplotlib.pyplot as plt

file_name="/home/jovyan/work/307.wav"

y, sr = librosa.load(file_name)

plt.figure(figsize=(12, 4))
librosa.display.waveplot(y, sr=sr)
plt.show()

S = librosa.feature.melspectrogram(y, sr=sr, n_mels=128)
log_S = librosa.power_to_db(S, ref=np.max)
plt.figure(figsize=(12, 4))
librosa.display.specshow(log_S, sr=sr, x_axis='time', y_axis='mel')
plt.title('mel power spectrogram')
plt.colorbar(format='%+02.0f dB')
plt.tight_layout()
plt.show()




"""
Plot the MFCC of file_name="/home/jovyan/work/307.wav"
"""

import librosa
import librosa.display
import matplotlib.pyplot as plt

file_name="/home/jovyan/work/307.wav"

y, sr = librosa.load(file_name)

plt.figure(figsize=(12, 4))
librosa.display.waveplot(y, sr=sr)
plt.show()

mfccs = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=40)
plt.figure(figsize=(12, 4))
librosa.display.specshow(mfccs, x_axis='time')
plt.colorbar()
plt.title('MFCC')
plt.tight_layout()
plt.show()


"""
Plot the chromagram of file_name="/home/jovyan/work/307.wav"
"""

import librosa
import librosa.display
import matplotlib.pyplot as plt

file_name="/home/jovyan/work/307.wav"

y, sr = librosa.load(file_name)

plt.figure(figsize=(12, 4))
librosa.display.waveplot(y, sr=sr)
plt.show()

chroma = librosa.feature.chroma_stft(y=y, sr=sr)
plt.figure(figsize=(12, 4))
librosa.display.specshow(chroma, y_axis='chroma', x_axis='time')
plt.colorbar()
plt.title('Chromagram')
plt.tight_layout()
plt.show()


"""
Plot the chromagram of file_name="/home/jovyan/work/307.wav"
"""

import librosa
import librosa.display
import matplotlib.pyplot as plt

file_name="/home/jovyan/work/307.wav"

y, sr = librosa.load(file_name)

plt.figure(figsize=(12, 4))
librosa.display.waveplot(y, sr=sr)
plt.show()

chroma = librosa.feature.chroma_stft(y=y, sr=sr)
plt.figure(figsize=(12, 4))
librosa.display.specshow(chroma, y_axis='chroma', x_axis='time')
plt.colorbar()
plt.title('Chromagram')
plt.tight_layout()
plt.show()


"""
Plot the tonnetz of file_name="/home/jovyan/work/307.wav"
"""

import librosa
import librosa.display
import matplotlib.pyplot as plt

file_name="/home/jovyan/work/307.wav"

y, sr = librosa.load(file_name)

plt.figure(figsize=(12, 4))
librosa.display.waveplot(y, sr=sr)
plt.show()

tonnetz = librosa.feature.tonnetz(y=y, sr=sr)
plt.figure(figsize=(12, 4))
librosa.display.specshow(tonnetz, y_axis='tonnetz', x_axis='time')
plt.colorbar()
plt.title('Tonal Centroids (Tonnetz)')
plt.tight_layout()
plt.show()


"""
Plot the zero-crossing rate of file_name="/home/jovyan/work/307.wav"
"""

import librosa
import librosa.display
import matplotlib.pyplot as plt

file_name="/home/jovyan/work/307.wav"

y, sr = librosa.load(file_name)

plt.figure(figsize=(12, 4))
librosa.display.waveplot(y, sr=sr)
plt.show()

zrate = librosa.feature.zero_crossing_rate(y)
plt.figure(figsize=(12, 4))
librosa.display.specshow(zrate, x_axis='time')
plt.colorbar()
plt.title('Zero Crossing Rate')
plt.tight_layout()
plt.show()


"""
Plot the spectral centroid of file_name="/home/jovyan/work/307.wav"
"""

import librosa
import librosa.display
import matplotlib.pyplot as plt

file_name="/home/jovyan/work/307.wav"

y, sr = librosa.load(file_name)

plt.figure(figsize=(12, 4))
librosa.display.waveplot(y, sr=sr)
plt.show()

cent = librosa.feature.spectral_centroid(y=y, sr=sr)
plt.figure(figsize=(12, 4))
librosa.display.specshow(cent, x_axis='time')
plt.colorbar()
plt.title('Spectral Centroids')
plt.tight_layout()
plt.show()


"""
Plot the spectral bandwidth of file_name="/home/jovyan/work/307.wav"
"""

import librosa
import librosa.display
import matplotlib.pyplot as plt

file_name="/home/jovyan/work/307.wav"

y, sr = librosa.load(file_name)

plt.figure(figsize=(12, 4))
librosa.display.waveplot(y, sr=sr)
plt.show()

spec_bw = librosa.feature.spectral_bandwidth(y=y, sr=sr)
plt.figure(figsize=(12, 4))
librosa.display.specshow(spec_bw, x_axis='time')
plt.colorbar()
plt.title('Spectral Bandwidth')
plt.tight_layout()
plt.show()


"""
Plot the spectral rolloff of file_name="/home/jovyan/work/307.wav"
"""

import librosa
import librosa.display
import matplotlib.pyplot as plt

file_name="/home/jovyan/work/307.wav"

y, sr = librosa.load(file_name)

plt.figure(figsize=(12, 4))
librosa.display.waveplot(y, sr=sr)
plt.show()

spec_rolloff = lib

After waiting a while… we get:

Ok, Chromagram twice… bold move, Codex.
import librosa
import librosa.display
import matplotlib.pyplot as plt

file_name="/home/jovyan/work/307.wav"

y, sr = librosa.load(file_name)

plt.figure(figsize=(12, 8))
D = librosa.amplitude_to_db(np.abs(librosa.stft(y)), ref=np.max)
plt.subplot(4, 2, 1)
librosa.display.specshow(D, y_axis='linear')
plt.colorbar(format='%+2.0f dB')
plt.title('Linear-frequency power spectrogram')

plt.subplot(4, 2, 2)
librosa.display.specshow(D, y_axis='log')
plt.colorbar(format='%+2.0f dB')
plt.title('Log-frequency power spectrogram')

plt.subplot(4, 2, 3)
librosa.display.specshow(D, x_axis='time')
plt.colorbar(format='%+2.0f dB')
plt.title('Time-frequency power spectrogram')

plt.subplot(4, 2, 4)
librosa.display.specshow(D, x_axis='time', y_axis='log')
plt.colorbar(format='%+2.0f dB')
plt.title('Log-Time-frequency power spectrogram')
Categories
AI/ML arxiv Behaviour control deep Locomotion simulation

PGA Map Elites

Though I’m generally using stable baseline algorithms for training locomotion tasks, I am sometimes drawn back to evolutionary algorithms, and especially Map Elites, which has now been upgraded to incorporate a policy gradient.

The archiving of behaviours is what attracts me to Map Elites.

PGA Map Elites based on top of QDGym, which tracks Quality Diversity, is probably worth a look.

Categories
3D AI/ML arxiv CNNs control deep envs sim2real simulation Vision

UnrealROX

A sim2real with more or less photorealism, since Unreal Engine is so pretty. arxiv: https://arxiv.org/pdf/1810.06936.pdf

And the github writeup: https://sim2realai.github.io/UnrealROX/

Beautiful. Since we’re currently just using a 256×256 view port in pybullet, this is quite a bit more advanced than required though. Learning game engines can also take a while. It took me about a month to learn Unity3d, with intermediate C# experience. Unreal Engine uses C++, so it’s a bit less accessible to beginners.