Categories
AI/ML CNNs dev Vision

Panoptic segmentation

arxiv paper: https://arxiv.org/pdf/1801.00868.pdf

Panoptic segmentation picks an instance segmentation algorithm and a semantic segmentation algorithm.

Some notable papers are listed here, with the benchmarks of the best related githubs, https://paperswithcode.com/task/panoptic-segmentation

For example,

  • MASK_RCNN algorithm for instance segmentation
  • DeepLabV2 Algorithm for semantic segmentation

The architecture of the neural network has two pyramids, one for semantics (classes), and one to count the instances

After much circular investigation, i arrived at the notion that transfer learning from a pre-trained network, with the ‘fine tuning’ referring to adding a new class, is the way to go.

But we’re still suffering from not having found an example using PNG mask files. I can convert to COCO, and that might be what I do yet, because, like the dataset had their own Panoptic segmentation challenge and format. https://cocodataset.org/#panoptic-eval They seem to be winning this race. We’ll do COCO.

It will mostly involve writing or exporting info into json format, and following some terse, ambiguous instructions.

Another thing is that COCO wants bounding boxes too. So this will be an exercise in config generation to satisfy the COCO format requirements. I have the data from Open images, but COCO looks like the biggest game in town.

Then for algorithm, there’s numerous Pytorch libraries, especially a very relevant one, YOLACT Edge, using a ‘Darknet’ architecture, which is an old “Open Source Neural Networks in C”

Hmm. It’s more instance segmentation than panoptic, but looks like a good compromise, to aim for.

https://github.com/haotian-liu/yolact_edge – It uses bounding boxes, so what will I do with all these chicken masks?

YOLACTEdge arxiv paper

Otherwise, the tensorflow object detection tutorials are here:

https://github.com/tensorflow/models/tree/master/research/object_detection/colab_tutorials

The eager_few_shot_od_training_tflite.ipynb notebook also looks like a winner for showing how to add a new Duck class to a MobileNet architecture. YOLACT Edge has a MobileNet model available too.

I am sitting with a thousand or so JPGs of chickens with corresponding PNG masks, sorted into train/val/test datasets. I was hoping for the Keras UNet segmentation demo to work because I initially thought UNet will be ideal for the egg light camera, but now I’m back to the FAIR detectron2 woods, to find a panoptic segmentation solution.

Let’s try the YOLACT Edge one, because it’s based on YOLO, (You only look once), a single shot object detector algorithm, but which is also more commonly known for ‘You only live once’, an affirmation of often reckless behaviour. YOLACT stands for You Only Look At CoefficienTs. In this case it looks like the state of the art, and it’s been used on the Jetson before, which is promising. At 30 frames per second on the Jetson AGX, we’ll probably be getting 20 or so on the Jetson NX. Since that’s using Torch to TensorRT to speed it up, it seems like we should try it. I was initially averse to using NVIDIA specific software, but we should make the most of this hardware (if we can)

It’s not really panoptic segmentation. But it’s looking Good Enough™ like what we need, rather than what we thought we wanted.

Let’s try these instructions:

https://github.com/haotian-liu/yolact_edge/blob/master/INSTALL.md

We’ll try it on the NX. “Inside” the Docker. What’s our CUDA version?

nvcc --version

10.2

TensorRT should already be installed.

(On Nano, if nvcc not found, check out this link )

git clone https://github.com/NVIDIA-AI-IOT/torch2trt
cd torch2trt
sudo python3 setup.py install --plugins

Here’s from the COCO panoptic readme.

https://cocodataset.org/#format-data RELEVANT EXCERPT FOR….

Panoptic Segmentation

For the panoptic task, each annotation struct is a per-image annotation rather than a per-object annotation. Each per-image annotation has two parts: (1) a PNG that stores the class-agnostic image segmentation and (2) a JSON struct that stores the semantic information for each image segment. In more detail:

  1. To match an annotation with an image, use the image_id field (that is annotation.image_id==image.id).
  2. For each annotation, per-pixel segment ids are stored as a single PNG at annotation.file_name. The PNGs are in a folder with the same name as the JSON, i.e., annotations/name/ for annotations/name.json. Each segment (whether it’s a stuff or thing segment) is assigned a unique id. Unlabeled pixels (void) are assigned a value of 0. Note that when you load the PNG as an RGB image, you will need to compute the ids via ids=R+G*256+B*256^2.
  3. For each annotation, per-segment info is stored in annotation.segments_info. segment_info.id stores the unique id of the segment and is used to retrieve the corresponding mask from the PNG (ids==segment_info.id). category_id gives the semantic category and iscrowd indicates the segment encompasses a group of objects (relevant for thing categories only). The bbox and area fields provide additional info about the segment.
  4. The COCO panoptic task has the same thing categories as the detection task, whereas the stuff categories differ from those in the stuff task (for details see the panoptic evaluation page). Finally, each category struct has two additional fields: isthing that distinguishes stuff and thing categories and color that is useful for consistent visualization.
annotation{

"image_id": int, 
"file_name": str, 
"segments_info": [segment_info],}


segment_info{
             "id": int, 
             "category_id": int, 
             "area": int, 
             "bbox": [x,y,width,height], 
             "iscrowd": 0 or 1,}

categories[{"id": int, 
            "name": str,  
            "supercategory": str, 
            "isthing": 0 or 1, 
            "color": [R,G,B],}]

Ok, we can do this.

Right, so, if anything, we want to transfer learn from a trained neural network. There’s some interesting discussion about implementing your own transfer learning of a coco dataset, in keras-retinanet here, but we’re looking at using Yolact Edge, based on pytorch, so let’s not get distracted. We need to create the COCO dataset. I’ve put this off for so long.

We need the COCO categories that are already trained, and I see there is the 2018 api https://github.com/cocodataset/panopticapi which has the Panoptic challenge coco categories (panoptic_coco_categories.json) and ah ha this is what I have been searching for.

panopticapi/sample_data/panoptic_examples.json

After pretty printing with

python3 -m json.tool panoptic_examples.json

here’s the example, for this bit.

"images": [
{
"license": 2,
"file_name": "000000142238.jpg",
"coco_url": "http://images.cocodataset.org/val2017/000000142238.jpg",
"height": 427,
"width": 640,
"date_captured": "2013-11-20 16:47:35",
"flickr_url": "http://farm5.staticflickr.com/4028/5079131149_dde584ed79_z.jpg",
"id": 142238
},
{
"license": 1,
"file_name": "000000439180.jpg",
"coco_url": "http://images.cocodataset.org/val2017/000000439180.jpg",
"height": 360,
"width": 640,
"date_captured": "2013-11-19 01:25:39",
"flickr_url": "http://farm3.staticflickr.com/2831/9275116980_1d9b986e3b_z.jpg",
"id": 439180
}
]

and we’ve got some images

./input_images/000000439180.jpg
./input_images/000000142238.jpg

and their masks.

./panoptic_examples/000000439180.png
./panoptic_examples/000000142238.png

Ah here’s ‘bird’ category.

{
"supercategory": "animal",
"color": [
 165,
 42,
 42
],
"isthing": 1,
"id": 16,
"name": "bird"
},

“Let’s try get some visualisation working”

Ok hold on though. Let’s try get some visualisation working, before anything else. This looks like the ticket. But it is a python file, and running matplotlib, so ideally we’d transform this to a Jupyter Notebook. Ok, just New Notebook, copy paste. Run.

ModuleNotFoundError: No module named 'skimage'



[Big Detour and to the rescue, Datamachines]

Ok we can install it with !pip3 install scikit-image ? No, that fails… what did I do, right, I need to ssh into the Jetson,

chrx@chrx:~$ ssh -L 8888:127.0.0.1:8888 -L 6006:127.0.0.1:6006 chicken@192.168.101.109

Then find the docker ID, and docker exec -it 519ed46162ae bash into it, and goddamnit what, UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0xc3 in position 4029: ordinal not in range(128)

Ok so someone’s already had this happen, and it’s because the locale preferred encoding, needs to be UTF-8. But it’s some obscrure ANSI.

root@jetson:/# python -c 'import locale; print(locale.getpreferredencoding())'
ANSI_X3.4-1968

Someone posted a bunch of steps for the L4T docker folks. That would be us. Do we really need this library ?

It’s to get this function.

from skimage.segmentation import find_boundaries

Yes, ok, it is quite hellish to install skimage. This was how to do it in debian, for skimage up to v. 0.13.1-2

apt install python3-skimage

But on it gets “ImportError: cannot import name ‘_validate_lengths'” which is resolved in 1.14.2

I’ve asked on the forum, and am hoping NVIDIA can solve this one. The skimage docs say:

  1. Linux on 64-bit ARM processors (Nvidia Jetson):

As per the latest comment, (only 3 weeks ago, others were on the trail of similar tasks!), mmartial is pointing to datamachines, which has some Dockerfiles for building OpenCV and Tensorflow, and YOLOv4.

Ok, let’s try what the instructions suggest:

make tensorflow_opencv to build all the tensorflow_opencv container images”

I’ll try the CuDNN version next if this doesn’t work…

Ok…we’re on step 16 of 42… Ooh Python 3.8, that’s an upgrade. Build those wheels, pip3! Doh, Step 24 of 42.

bazel: Exec format error

The command returned a non-zero code: 2

*Whomp whomp* sound

ok let’s try

make cudnn_tensorflow_opencv, no…

I asked on the Issues, and they noticed those are the amd64 builds, not the aarch64 build. I could use their DockerHub pre-build for now.

so after a detour, i am using this Dockerfile successfully to run Jupyter on the NX. We got stuck because skimage was difficult to install, and now we’re back on track, annotating the COCO, and so on.

chicken@jetson:~$ cat Dockerfile

FROM docker.io/datamachines/jetsonnano-cuda_tensorflow_opencv:10.2_2.3_4.5.1-20210218
RUN pip3 install jupyter jupyterlab --verbose
RUN jupyter lab --generate-config
RUN python3 -c "from notebook.auth.security import set_password; set_password('nvidia', '/root/.jupyter/jupyter_notebook_config.json')"
EXPOSE 6006
EXPOSE 8888
CMD /bin/bash -c "jupyter lab --ip 0.0.0.0 --port 8888 --allow-root &> /var/log/jupyter.log" & \
echo "allow 10 sec for JupyterLab to start @ http://$(hostname -I | cut -d' ' -f1):8888 (password nvidia)" && \
echo "JupterLab logging location: /var/log/jupyter.log (inside the container)" && \
/bin/bash

chicken@jetson:~$ sudo docker build -t nx_setup .

chicken@jetson:~$ sudo docker run -it -p 8888:8888 -p 6006:6006 --rm --runtime nvidia --network host -v /home/chicken/:/dmc nx_setup

So, where were we?

Right. Panoptic API, we wanted to run visualize.py, first, so we could check progress. But it needed skimage installed. Haha. Ok, one week later… let’s try see the example.

Phew, ok. Getting back on track. So now we want to train it on the chickens.

So, COCO.

“Back to COCO”

As someone teaching myself about this, I know what I ideally want is to transfer learn from a trained network. But it isn’t obvious how. I apparently need to chop off the last layer of a trained network, freeze most of the network, and then retrain the last bit.

Well, back to this soon…

So,

Here we have a suggestion from dbolya author of YOLACT and YOLACT++, the original.


try: self.load_state_dict(state_dict) except RuntimeError as e: print('Ignoring "' + str(e) + '"')
and then resume training from yolact_im700_54_80000.pth:
python train.py --config=<your_config> --resume=weights/yolact_im700_54_800000.pth --start_iter=0
When there are size mismatches between tensors, Pytorch will spit out an error message but also keep on loading the rest of the tensors anyway. So here we just attempt to load a checkpoint with the wrong number of classes, eat the errors the Pytorch complains about, and then start training from iteration 0 with just those couple of tensors being untrained. You should see only the C (class) and S (semantic segmentation) losses reset.
You probably also want to modify the learning rate, decay schedule, and number of iterations in your config to account for fine-tuning.

And an allusion to an example of its use, perhaps. And more clues about how to fine tune the ‘network head’.

You can do this by following the fine tuning procedure (#36) and then here:

yolact/yolact.py

Line 628 in f54b0a5

p = pred_layer(pred_x)

replace that with

p = pred_layer(pred_x)
p = pred_layer(pred_x.detach())

Ok… so here’s the YOLACT diagram:

A command to run the training.