Categories
AI/ML CNNs OpenCV Vision

Mask R-CNN

Image for post

Paper: https://arxiv.org/pdf/1703.06870.pdf

FB really likes detecting things. I went with their PyTorch version. The matterport version didn’t work out of the box, so went with FB’s code to try image segmentation.

Caffe2 version: https://github.com/facebookresearch/Detectron

PyTorch version: https://github.com/facebookresearch/Detectron2

Matterport’s version: https://github.com/matterport/Mask_RCNN

Deep Learning based Image Segmentation with OpenCV: https://www.pyimagesearch.com/2018/11/26/instance-segmentation-with-opencv/

https://engineering.matterport.com/splash-of-color-instance-segmentation-with-mask-r-cnn-and-tensorflow-7c761e238b46

Image for post

Also Watershed algorithm is available in OpenCV:

Watershed: http://www.cmm.mines-paristech.fr/~beucher/wtshed.html

Result

Segmenting an image by the watershed transformation is therefore a two-step process:

* Finding the markers and the segmentation criterion (the criterion or function which will be used to split the regions – it is most often the contrast or gradient, but not necessarily).

* Performing a marker-controlled watershed with these two elements.

https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_watershed/py_watershed.html

Categories
Vision

DSO: Direct Sparse Odometry

It seems I didn’t mention this algorithm, but it’s a SLAM-like point cloud thing I must have played with, as I came across this screenshot

Tips section from https://github.com/JakobEngel/dso

Accurate Geometric Calibration

  • Please have a look at Chapter 4.3 from the DSO paper, in particular Figure 20 (Geometric Noise). Direct approaches suffer a LOT from bad geometric calibrations: Geometric distortions of 1.5 pixel already reduce the accuracy by factor 10.
  • Do not use a rolling shutter camera, the geometric distortions from a rolling shutter camera are huge. Even for high frame-rates (over 60fps).
  • Note that the reprojection RMSE reported by most calibration tools is the reprojection RMSE on the “training data”, i.e., overfitted to the the images you used for calibration. If it is low, that does not imply that your calibration is good, you may just have used insufficient images.
  • try different camera / distortion models, not all lenses can be modelled by all models.

Photometric Calibration

Use a photometric calibration (e.g. using https://github.com/tum-vision/mono_dataset_code ).

Translation vs. Rotation

DSO cannot do magic: if you rotate the camera too much without translation, it will fail. Since it is a pure visual odometry, it cannot recover by re-localizing, or track through strong rotations by using previously triangulated geometry…. everything that leaves the field of view is marginalized immediately.

Categories
Vision

Early ConvNet visualisations

https://link.springer.com/article/10.1186/s40648-019-0141-2

https://imgur.com/a/Hqolp

AxCell: Automatic Extraction of Results from Machine Learning Papers

https://arxiv.org/abs/2004.14356

Categories
CNNs deep Vision

MeshCNN

Currently we have LSD-SLAM working, and that’s cool for us humans to see stuff, but having an object mesh to work with makes more sense. I don’t know if there’s really any difference, but at least in terms of simulator integration, this makes sense. I’m thinking, there’s object detection, semantic segmentation, etc, etc, and in the end, I want the robot to have a relative coordinate system, in a way. But robots will probably get by with just pixels and stochastic magic.

But the big idea for me, here, is transform monocular camera images into mesh objects. Those .obj files or whatever, could be imported into the physics engine, for training in simulation.

arxiv: https://arxiv.org/pdf/1809.05910v2.pdf

github: https://ranahanocka.github.io/MeshCNN/

The PhD candidate: https://www.cs.tau.ac.il/~hanocka/ – In the Q&A at the end, she mentions AtlasNet https://arxiv.org/abs/1802.05384 as only being able to address local structures. Latest research looks interesting too https://arxiv.org/pdf/2003.13326.pdf

ShapeNET https://arxiv.org/abs/1512.03012 seems to be a common resource, and https://arxiv.org/pdf/2004.15004v2.pdf and these obj files might be interesting https://www.dropbox.com/s/w16st84r6wc57u7/shrec_16.tar.gz

Categories
dev robots Vision

ROS Camera Topic

What is a ros topic? http://wiki.ros.org/Topics
ROS can publish the webcam stream to a “topic”, and any part of the robot can subscribe to it, by name, if it is interested in that data. ROS is almost like a program where everything is a global variable.

https://answers.ros.org/question/218228/ros-example-program-doesnt-work-with-the-laptop-webcam/

I made this file for the laptop webcam, but then didn’t end up using it.

<launch>
  <group ns="camera">
    <node pkg="libuvc_camera" type="camera_node" name="mycam">
      <!-- Parameters used to find the camera -->
      <param name="vendor" value="0x2232"/>
      <param name="product" value="0x1082"/>
      <param name="serial" value=""/>
      <!-- If the above parameters aren't unique, choose the first match: -->
      <param name="index" value="0"/>

      <!-- Image size and type -->
      <param name="width" value="640"/>
      <param name="height" value="480"/>
      <!-- choose whichever uncompressed format the camera supports: -->
      <param name="video_mode" value="uncompressed"/> <!-- or yuyv/nv12/mjpeg -->
      <param name="frame_rate" value="15"/>

      <param name="timestamp_method" value="start"/> <!-- start of frame -->
      <param name="camera_info_url" value="file:///tmp/cam.yaml"/>

      <param name="auto_exposure" value="3"/> <!-- use aperture_priority auto exposure -->
      <param name="auto_white_balance" value="false"/>
    </node>
  </group>
</launch>

roscore

apt install ros-melodic-uvc-camera

rospack listnames

rosrun uvc_camera uvc_camera_node _device:=/dev/video0

rostopic list

(should show /image_raw now…)

rosrun dso_ros dso_live calib=/opt/catkin_ws/src/dso_ros/camera.txt image:=/image_raw/

Categories
AI/ML arxiv Vision

Instance Segmentation

https://arxiv.org/pdf/2003.10152.pdf – SOLOv2

https://arxiv.org/pdf/2003.06148.pdf – PointINS: Point-based Instance Segmentation

cool site, paperswithcode.

https://paperswithcode.com/task/instance-segmentation?page=4

Categories
CNNs Vision

Visualize CNNs

https://github.com/fg91/visualizing-cnn-feature-maps

Image for post
Image for post

“There are two main ways to try to understand how a neural network recognizes a certain pattern. If you want to know what kind of pattern significantly activates a certain feature map you could 1) either try to find images in a dataset that result in a high average activation of this feature map or you could 2) try to generate such a pattern by optimizing the pixel values in a random image. The latter idea was proposed by Erhan et al. 2009

from: https://towardsdatascience.com/how-to-visualize-convolutional-features-in-40-lines-of-code-70b7d87b0030


class FilterVisualizer():
    def __init__(self, size=56, upscaling_steps=12, upscaling_factor=1.2):
        self.size, self.upscaling_steps, self.upscaling_factor = size, upscaling_steps, upscaling_factor
        self.model = vgg16(pre=True).cuda().eval()
        set_trainable(self.model, False)

    def visualize(self, layer, filter, lr=0.1, opt_steps=20, blur=None):
        sz = self.size
        img = np.uint8(np.random.uniform(150, 180, (sz, sz, 3)))/255  # generate random image
        activations = SaveFeatures(list(self.model.children())[layer])  # register hook

        for _ in range(self.upscaling_steps):  # scale the image up upscaling_steps times
            train_tfms, val_tfms = tfms_from_model(vgg16, sz)
            img_var = V(val_tfms(img)[None], requires_grad=True)  # convert image to Variable that requires grad
            optimizer = torch.optim.Adam([img_var], lr=lr, weight_decay=1e-6)
            for n in range(opt_steps):  # optimize pixel values for opt_steps times
                optimizer.zero_grad()
                self.model(img_var)
                loss = -activations.features[0, filter].mean()
                loss.backward()
                optimizer.step()
            img = val_tfms.denorm(img_var.data.cpu().numpy()[0].transpose(1,2,0))
            self.output = img
            sz = int(self.upscaling_factor * sz)  # calculate new image size
            img = cv2.resize(img, (sz, sz), interpolation = cv2.INTER_CUBIC)  # scale image up
            if blur is not None: img = cv2.blur(img,(blur,blur))  # blur image to reduce high frequency patterns
        self.save(layer, filter)
        activations.close()
        
    def save(self, layer, filter):
        plt.imsave("layer_"+str(layer)+"_filter_"+str(filter)+".jpg", np.clip(self.output, 0, 1))

and use it like this:

layer = 40
filter = 265
FV = FilterVisualizer(size=56, upscaling_steps=12, upscaling_factor=1.2)
FV.visualize(layer, filter, blur=5)
Categories
Vision

Monocular SLAM

For drawing a map of a place. (Simultaneous localization and mapping). Monocular meaning single camera.

https://vision.in.tum.de/research/vslam/lsdslam

https://ubilang.wordpress.com/2016/05/07/orb-slam-vs-lsd-slam/

and point clouds

http://pointclouds.org/blog/tocs/alexandrov/index.php

https://github.com/PointCloudLibrary/pcl

https://github.com/raulmur/ORB_SLAM2

https://github.com/tum-vision/lsd_slam

Categories
Vision

ImageHub and ImageNodes

I set up the https://github.com/jeffbass/imagenode and https://github.com/jeffbass/imagehub and https://github.com/jeffbass/imagezmq from earlier

I needed these. (latest openCV has bug)

pip3 install pyyaml numpy virtualenv zmq imutils psutil picamera
pip3 install opencv-contrib-python==4.1.0.25

On the imagehub side, it finds /root/imagenode.yaml and sets up a folder.

Then the imagenode side, it looks for the directory structure with imagenode/ imagezmq/ and imagenode.yaml in the parent folder. You replace the contents with the YAML examples in the tests folder.

Then when it detects motion in the blue box, it takes pics that arrive in ~/imagehub_data/images/2020-04-12#

So this is a good start for various applications.

Categories
AI/ML CNNs Vision

Self Attention

https://attentionagent.github.io/ there is no conscious perception of the visual world without attention to it

http://papers.nips.cc/paper/8302-stand-alone-self-attention-in-vision-models

and the difference between them and conv nets

https://openreview.net/forum?id=HJlnC1rKPB

https://github.com/epfml/attention-cnn