Categories
meta

Musing on audio/visual/motor brain

Just some notes to myself. We’re going to be doing some advanced shitty robots here, with Sim-To-Real policy transfer.

ENSEMBLE NNs

I had a look at merging NNs, and found this https://machinelearningmastery.com/ensemble-methods-for-deep-learning-neural-networks/ with this link as one of the most recent articles: https://arxiv.org/abs/1803.05407 – It recommends using averages of multiple NNs.

AUDIO

For audio there’s https://github.com/google-coral/project-keyword-spotter which uses a 72 euro TPU https://coral.ai/products/accelerator/ for fast processing.

I’ve seen convolution network style NNs on spectrograms of audio (eg https://medium.com/gradientcrescent/urban-sound-classification-using-convolutional-neural-networks-with-keras-theory-and-486e92785df4) Anyway, it’s secondary. We can have it work with an mic with a volume threshold to start with.

MOTION

Various neural networks will be trained in simulation, to perform different tasks, with egg and chicken and human looking objects. Ideally we develop a robot that can’t really fall over.

We need to decide whether we’re giving it spacial awareness in 3d, using point clouds maybe? Creating mental maps of the environment?

VISION

Convolution networks are typical for vision tasks. We can however use HyperNEAT for visual discrimination, here: https://github.com/PacktPublishing/Hands-on-Neuroevolution-with-Python/tree/master/Chapter7

But what will make sense is to have the RPi take pics, send them across to a server on a desktop computer, play around with the image in OpenCV first, and then feed that to the neuro-evolution process.