Category: Vision

Mask R-CNN

Post author By DJ Bro Bot
Post date August 11, 2020

Paper: https://arxiv.org/pdf/1703.06870.pdf

FB really likes detecting things. I went with their PyTorch version. The matterport version didn’t work out of the box, so went with FB’s code to try image segmentation.

Caffe2 version: https://github.com/facebookresearch/Detectron

PyTorch version: https://github.com/facebookresearch/Detectron2

Matterport’s version: https://github.com/matterport/Mask_RCNN

Deep Learning based Image Segmentation with OpenCV: https://www.pyimagesearch.com/2018/11/26/instance-segmentation-with-opencv/

https://engineering.matterport.com /splash-of-color-instance-segmentation-with-mask-r-cnn-and-tensorflow-7c761e238b46

Also Watershed algorithm is available in OpenCV:

Watershed: http://www.cmm.mines-paristech.fr/~beucher/wtshed.html

Segmenting an image by the watershed transformation is therefore a two-step process:

* Finding the markers and the segmentation criterion (the criterion or function which will be used to split the regions – it is most often the contrast or gradient, but not necessarily).

* Performing a marker-controlled watershed with these two elements.

https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_watershed/py_watershed.html

Vision

DSO: Direct Sparse Odometry

Post author By DJ Bro Bot
Post date July 24, 2020

It seems I didn’t mention this algorithm, but it’s a SLAM-like point cloud thing I must have played with, as I came across this screenshot

Tips section from https://github.com/JakobEngel/dso

Accurate Geometric Calibration

Please have a look at Chapter 4.3 from the DSO paper, in particular Figure 20 (Geometric Noise). Direct approaches suffer a LOT from bad geometric calibrations: Geometric distortions of 1.5 pixel already reduce the accuracy by factor 10.
Do not use a rolling shutter camera, the geometric distortions from a rolling shutter camera are huge. Even for high frame-rates (over 60fps).
Note that the reprojection RMSE reported by most calibration tools is the reprojection RMSE on the “training data”, i.e., overfitted to the the images you used for calibration. If it is low, that does not imply that your calibration is good, you may just have used insufficient images.
try different camera / distortion models, not all lenses can be modelled by all models.

Photometric Calibration

Use a photometric calibration (e.g. using https://github.com/tum-vision/mono_dataset_code ).

Translation vs. Rotation

DSO cannot do magic: if you rotate the camera too much without translation, it will fail. Since it is a pure visual odometry, it cannot recover by re-localizing, or track through strong rotations by using previously triangulated geometry…. everything that leaves the field of view is marginalized immediately.

Vision

Early ConvNet visualisations

Post author By DJ Bro Bot
Post date May 12, 2020

ImageNet: VGGNet, ResNet, Inception, and Xception with Keras

https://link.springer.com/article/10.1186/s40648-019-0141-2

https://imgur.com/a/Hqolp

AxCell: Automatic Extraction of Results from Machine Learning Papers

https://arxiv.org/abs/2004.14356

CNNs deep Vision

MeshCNN

Post author By DJ Bro Bot
Post date May 8, 2020

Currently we have LSD-SLAM working, and that’s cool for us humans to see stuff, but having an object mesh to work with makes more sense. I don’t know if there’s really any difference, but at least in terms of simulator integration, this makes sense. I’m thinking, there’s object detection, semantic segmentation, etc, etc, and in the end, I want the robot to have a relative coordinate system, in a way. But robots will probably get by with just pixels and stochastic magic.

But the big idea for me, here, is transform monocular camera images into mesh objects. Those .obj files or whatever, could be imported into the physics engine, for training in simulation.

arxiv: https://arxiv.org/pdf/1809.05910v2.pdf

github: https://ranahanocka.github.io/MeshCNN/

The PhD candidate: https://www.cs.tau.ac.il/~hanocka/ – In the Q&A at the end, she mentions AtlasNet https://arxiv.org/abs/1802.05384 as only being able to address local structures. Latest research looks interesting too https://arxiv.org/pdf/2003.13326.pdf

ShapeNET https://arxiv.org/abs/1512.03012 seems to be a common resource, and https://arxiv.org/pdf/2004.15004v2.pdf and these obj files might be interesting https://www.dropbox.com/s/w16st84r6wc57u7/shrec_16.tar.gz

dev robots Vision

ROS Camera Topic

Post author By DJ Bro Bot
Post date April 25, 2020

What is a ros topic? http://wiki.ros.org/Topics
ROS can publish the webcam stream to a “topic”, and any part of the robot can subscribe to it, by name, if it is interested in that data. ROS is almost like a program where everything is a global variable.

https://answers.ros.org/question/218228/ros-example-program-doesnt-work-with-the-laptop-webcam/

I made this file for the laptop webcam, but then didn’t end up using it.

<launch>
  <group ns="camera">
    <node pkg="libuvc_camera" type="camera_node" name="mycam">
      <!-- Parameters used to find the camera -->
      <param name="vendor" value="0x2232"/>
      <param name="product" value="0x1082"/>
      <param name="serial" value=""/>
      <!-- If the above parameters aren't unique, choose the first match: -->
      <param name="index" value="0"/>

      <!-- Image size and type -->
      <param name="width" value="640"/>
      <param name="height" value="480"/>
      <!-- choose whichever uncompressed format the camera supports: -->
      <param name="video_mode" value="uncompressed"/> <!-- or yuyv/nv12/mjpeg -->
      <param name="frame_rate" value="15"/>

      <param name="timestamp_method" value="start"/> <!-- start of frame -->
      <param name="camera_info_url" value="file:///tmp/cam.yaml"/>

      <param name="auto_exposure" value="3"/> <!-- use aperture_priority auto exposure -->
      <param name="auto_white_balance" value="false"/>
    </node>
  </group>
</launch>

roscore

apt install ros-melodic-uvc-camera

rospack listnames

rosrun uvc_camera uvc_camera_node _device:=/dev/video0

rostopic list

(should show /image_raw now…)

rosrun dso_ros dso_live calib=/opt/catkin_ws/src/dso_ros/camera.txt image:=/image_raw/

AI/ML arxiv Vision

Instance Segmentation

Post author By DJ Bro Bot
Post date April 13, 2020

https://arxiv.org/pdf/2003.10152.pdf – SOLOv2

https://arxiv.org/pdf/2003.06148.pdf – PointINS: Point-based Instance Segmentation

cool site, paperswithcode.

https://paperswithcode.com/task/instance-segmentation?page=4

CNNs Vision

Visualize CNNs

Post author By DJ Bro Bot
Post date April 13, 2020

https://github.com/fg91/visualizing-cnn-feature-maps



class FilterVisualizer():
    def __init__(self, size=56, upscaling_steps=12, upscaling_factor=1.2):
        self.size, self.upscaling_steps, self.upscaling_factor = size, upscaling_steps, upscaling_factor
        self.model = vgg16(pre=True).cuda().eval()
        set_trainable(self.model, False)

    def visualize(self, layer, filter, lr=0.1, opt_steps=20, blur=None):
        sz = self.size
        img = np.uint8(np.random.uniform(150, 180, (sz, sz, 3)))/255  # generate random image
        activations = SaveFeatures(list(self.model.children())[layer])  # register hook

        for _ in range(self.upscaling_steps):  # scale the image up upscaling_steps times
            train_tfms, val_tfms = tfms_from_model(vgg16, sz)
            img_var = V(val_tfms(img)[None], requires_grad=True)  # convert image to Variable that requires grad
            optimizer = torch.optim.Adam([img_var], lr=lr, weight_decay=1e-6)
            for n in range(opt_steps):  # optimize pixel values for opt_steps times
                optimizer.zero_grad()
                self.model(img_var)
                loss = -activations.features[0, filter].mean()
                loss.backward()
                optimizer.step()
            img = val_tfms.denorm(img_var.data.cpu().numpy()[0].transpose(1,2,0))
            self.output = img
            sz = int(self.upscaling_factor * sz)  # calculate new image size
            img = cv2.resize(img, (sz, sz), interpolation = cv2.INTER_CUBIC)  # scale image up
            if blur is not None: img = cv2.blur(img,(blur,blur))  # blur image to reduce high frequency patterns
        self.save(layer, filter)
        activations.close()
        
    def save(self, layer, filter):
        plt.imsave("layer_"+str(layer)+"_filter_"+str(filter)+".jpg", np.clip(self.output, 0, 1))

and use it like this:

layer = 40
filter = 265
FV = FilterVisualizer(size=56, upscaling_steps=12, upscaling_factor=1.2)
FV.visualize(layer, filter, blur=5)

Vision

Monocular SLAM

Post author By DJ Bro Bot
Post date April 12, 2020

For drawing a map of a place. (Simultaneous localization and mapping). Monocular meaning single camera.

https://vision.in.tum.de/research/vslam/lsdslam

https://ubilang.wordpress.com/2016/05/07/orb-slam-vs-lsd-slam/

and point clouds

http://pointclouds.org/blog/tocs/alexandrov/index.php

https://github.com/PointCloudLibrary/pcl

https://github.com/raulmur/ORB_SLAM2

https://github.com/tum-vision/lsd_slam

Vision

ImageHub and ImageNodes

Post author By DJ Bro Bot
Post date April 12, 2020

I set up the https://github.com/jeffbass/imagenode and https://github.com/jeffbass/imagehub and https://github.com/jeffbass/imagezmq from earlier

I needed these. (latest openCV has bug)

pip3 install pyyaml numpy virtualenv zmq imutils psutil picamera
pip3 install opencv-contrib-python==4.1.0.25

On the imagehub side, it finds /root/imagenode.yaml and sets up a folder.

Then the imagenode side, it looks for the directory structure with imagenode/ imagezmq/ and imagenode.yaml in the parent folder. You replace the contents with the YAML examples in the tests folder.

Then when it detects motion in the blue box, it takes pics that arrive in ~/imagehub_data/images/2020-04-12#

So this is a good start for various applications.

AI/ML CNNs Vision