https://www.streamlit.io/ “Streamlit is an open-source Python library that makes it easy to build beautiful custom web-apps for machine learning and data science.”
Looks like maybe a sort of Jupyter Notebook player with a better UI.
INTRODUCTION: THINGS AND STUFF Ask someone what vision is for and you may get an answer about recognizing objects. Few people will tell you that vision is about recognizing materials. Yet materials are just as important as objects are. Our world involves steel and glass, paper and plastic, food and drink, leather and lace, ice and snow, not to mention blood sweat and tears. Nonetheless, if you peruse the scientific literature in human and machine vision, you will also find a great deal of attention paid to the problem of recognizing objects, and very little to the problem of recognizing materials. Why should this be?
Perhaps it is related to the general preference we have for talking about things rather than stuff.
In detectron2, the term “thing” is used for instance-level tasks, and “stuff” is used for semantic segmentation tasks. Both are used in panoptic segmentation.
Args: name (str): the name that identifies a dataset, e.g. "coco_2014_train". metadata (dict): extra metadata associated with this dataset. You can leave it as an empty dict. json_file (str): path to the json instance annotation file. image_root (str or path-like): directory which contains all the images.
Training Dataset: The sample of data used to fit the model.
Validation Dataset: The sample of data used to provide an unbiased evaluation of a model fit on the training dataset while tuning model hyperparameters. The evaluation becomes more biased as skill on the validation dataset is incorporated into the model configuration.
Test Dataset: The sample of data used to provide an unbiased evaluation of a final model fit on the training dataset.
Debugging…
To show an image with OpenCV, you need to follow it with cv2.waitKey()
As I don’t have an NVIDIA card, I needed to set cfg.MODEL.DEVICE=’cpu’
Got some “incompatible shapes” warnings – fair enough.
Since running on cpu, needed this environment variable setting to stop it from using too much memory
LRU_CACHE_CAPACITY=1 python3 eggid.py
Got one “training diverged” with 0.02 learning rate. Changed to 0.001. It freezes a lot. Ubuntu freezes if you use too much memory.
Ok it kept freezing. Going to have to try on Google Colab maybe, or maybe limit python’s memory use. But that would presumably just result in “Memory Error” instead, only slightly less annoying than the computer freezing.
Some guy did object detection, with bounding boxes: https://colab.research.google.com/drive/1BRiFBC06OmWNkH4VpPl8Sf7IT21w7vXr https://www.mrdbourke.com/airbnb-amenity-detection/
Ok, I tried again with Roboflow, but it seems they only support bounding box training, and not the segmentation training I want.
Let’s try training bounding box object detection on the egg dataset…
[09/18 22:53:15 d2.evaluation.coco_evaluation]: Preparing results for COCO format … [09/18 22:53:15 d2.evaluation.coco_evaluation]: Saving results to ./output/coco_instances_results.json [09/18 22:53:15 d2.evaluation.coco_evaluation]: Evaluating predictions with unofficial COCO API… Loading and preparing results… DONE (t=0.00s) creating index… index created! Running per image evaluation… Evaluate annotation type bbox COCOeval_opt.evaluate() finished in 0.00 seconds. Accumulating evaluation results… COCOeval_opt.accumulate() finished in 0.01 seconds. Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.595 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.857 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.528 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.501 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.340 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.559 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.469 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.642 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.642 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.500 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.362 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.633 [09/18 22:53:15 d2.evaluation.coco_evaluation]: Evaluation results for bbox:
So I think the training worked, perhaps, on the bounding boxes? Kinda hard to say without seeing it draw some boxes. Not entirely sure what these APs all mean, but are related to “Average Precision”: https://cocodataset.org/#detection-eval
So now, let’s do Google Open Images based training instead. It has a ‘Chicken’ subset, so that’s ideal. So I downloaded https://pypi.org/project/openimages/ and run some python:
from openimages.download import download_dataset
download_dataset("/media/chrx/0FEC49A4317DA4DA/openimages", ["Chicken"], annotation_format="pascal")
Ack this is only bounding boxes too.
Looks like https://pypi.org/project/oidv6/ is another open images downloader script.
Detectron2 needs COCO format, so converting from Pascal VOC to COCO… ?
I looked at this, https://github.com/roboflow-ai/voc2coco – nope, that’s bounding boxes only.
This looks like it might be the biggest format conversion app I’ve found, OpenVINO™ Toolkit
We can find the 'Chicken' category is represented by /m/09b5t:
wget https://storage.googleapis.com/openimages/v5/class-descriptions-boxable.csv
/m/09b5t,Chicken
I would prefer to get instance segmentation training working than bounding box training. But it looks like it’s gonna be a bit harder than anticipated.
At this point, we can download google open images, with some bounding box annotations in the OIDv6 format, and scale them down to 300×300 or similar. We can also get it in Pascal VOC format.
I’ve just set up a user on a friend’s server, and I followed the @nicolas.windt article.
Do I
a) try get Google Tensorflow’s object detection working, as described in @nicolas.windt’s article?
Traceback (most recent call last):
File "/home/danielb/work/models/research/object_detection/dataset_tools/create_oid_tf_record.py", line 45, in
from object_detection.dataset_tools import oid_tfrecord_creation
ImportError: No module named object_detection.dataset_tools
pip install tensorflow-object-detection-api
File "/home/danielb/work/models/research/object_detection/dataset_tools/create_oid_tf_record.py", line 110, in main
image_annotations, label_map, encoded_image)
File "/root/anaconda3/envs/tfRecords/lib/python2.7/site-packages/object_detection/dataset_tools/oid_tfrecord_creation.py", line 43, in tf_example_from_annotations_data_frame
annotations_data_frame.LabelName.isin(label_map)]
File "/root/anaconda3/envs/tfRecords/lib/python2.7/site-packages/pandas/core/generic.py", line 3614, in getattr
return object.getattribute(self, name)
AttributeError: 'DataFrame' object has no attribute 'LabelName'
This has to do with pandas not finding the format it wants.
---
I'm trying with python 3.8 now, and had to change as_matrix to to_numpy because it was deprecated, and had to change package names to tf.io.xxx
Now
File "/root/anaconda3/lib/python3.8/site-packages/object_detection/dataset_tools/oid_tfrecord_creation.py", line 71, in tf_example_from_annotations_data_frame
dataset_util.bytes_feature('{}.jpg'.format(image_id)),
File "/root/anaconda3/lib/python3.8/site-packages/object_detection/utils/dataset_util.py", line 30, in bytes_feature
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
TypeError: '000411001ff7dd4f.jpg' has type str, but expected one of: bytes
So it needs like a to-bytes sort of thing. [b'a', b'b'] is what stackoverflow came up with. So needs like [b'000411001ff7dd4f.jpg'] instead of ['000411001ff7dd4f.jpg'
"Convert string to bytes"
looks like
b = mystring.encode()
So,
def bytes_feature(value):
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
"Python string encoding is different in Python 2.7 vs 3.6 and it break Tensorflow."
"Hi, where i use encode() ?"
- in the https://github.com/tensorflow/models/issues/1597
ok... it's failing here:
standard_fields.TfExampleFields.filename: dataset_util.bytes_feature('{}.jpg'.format(image_id)),
ok and if i use value=value.encode()
TypeError: 48 has type int, but expected one of: bytes
(Ah, ASCII 48 is '0' from '000411001ff7dd4f', so not that.)
and value=[value.encode()] gets
AttributeError: 'bytes' object has no attribute 'encode'
...
but without .encode(),
TypeError: '000411001ff7dd4f.jpg' has type str, but expected one of: bytes
and the data is
feature_map = {
standard_fields.TfExampleFields.object_bbox_ymin:
dataset_util.float_list_feature(
filtered_data_frame_boxes.YMin.to_numpy()),
standard_fields.TfExampleFields.object_bbox_xmin:
dataset_util.float_list_feature(
filtered_data_frame_boxes.XMin.to_numpy()),
standard_fields.TfExampleFields.object_bbox_ymax:
dataset_util.float_list_feature(
filtered_data_frame_boxes.YMax.to_numpy()),
standard_fields.TfExampleFields.object_bbox_xmax:
dataset_util.float_list_feature(
filtered_data_frame_boxes.XMax.to_numpy()),
standard_fields.TfExampleFields.object_class_text:
dataset_util.bytes_list_feature(
filtered_data_frame_boxes.LabelName.to_numpy()),
standard_fields.TfExampleFields.object_class_label:
dataset_util.int64_list_feature(
filtered_data_frame_boxes.LabelName.map(lambda x: label_map[x])
.to_numpy()),
standard_fields.TfExampleFields.filename:
dataset_util.bytes_feature('{}.jpg'.format(image_id)),
standard_fields.TfExampleFields.source_id:
dataset_util.bytes_feature(image_id),
standard_fields.TfExampleFields.image_encoded:
dataset_util.bytes_feature(encoded_image),
}
and the input file looks like...
ImageID,Source,LabelName,Confidence,XMin,XMax,YMin,YMax,IsOccluded,IsTruncated,IsGroupOf,IsDepiction,IsInside
00e71a70a2f669ff,xclick,/m/09b5t,1,0.18049793,0.95435685,0.056603774,0.9638365,0,1,0,0,0
01463f5494340d3d,xclick,/m/09b5t,1,0,0.59791666,0.2125,0.965625,0,0,0,0,0
ok screw it. stackoverflow time.
https://stackoverflow.com/questions/64072148/typeerror-has-type-str-but-expected-one-of-bytes
Looks like it's a current bug: https://github.com/tensorflow/models/issues/7997
ok turns out I actually worked it out yesterday with .encode('utf-8'), but it went on to the same bug on the next line.
Ok now it generated some TFRecords.
So now we can train it...
As explained here: https://towardsdatascience.com/custom-object-detection-using-tensorflow-from-scratch-e61da2e10087
The models directory came with a notebook file (.ipynb) that we can use to get inference with a few tweaks. It is located at models/research/object_detection/object_detection_tutorial.ipynb. Follow the steps below to tweak the notebook:
Comment out cell #5 completely (just below Download Model)
Since we’re only testing on one image, comment out PATH_TO_TEST_IMAGES_DIR and TEST_IMAGE_PATHS in cell #9 (just below Detection)
In cell #11 (the last cell), remove the for-loop, unindent its content, and add path to your test image:
imagepath = 'path/to/image_you_want_to_test.jpg
After following through the steps, run the notebook and you should see the corgi in your test image highlighted by a bounding box!
or
b) Install pytorch, detectron2 (i keep thinking deceptron2), convert OIDv6 or Pascal VOC formats to COCO format (or ssh rsync the egg data files over to the new machine), and train Mask-RCNN, like with the eggs dataset? (I am using my friend’s server because my laptop can’t handle the training. Keeps freezing.)
or
c) Get EfficientDet running: Strangely, https://github.com/google/automl only contains EfficientDet. Is that AutoML? EfficientDet? Surely not. Odd.
Ok…
At this point i’m ok with just trying to get anything working. Bounding boxes. Ok. After an hour of just looking at options, probably B.
Ended up doing A. Seems Google just got Tensorflow 2’s Object detection API working working recently: https://blog.tensorflow.org/2020/07/tensorflow-2-meets-object-detection-api.html
TF2 is based on top of Keras. From what I can tell so far, the main difference between TF2 and PyTorch is that you can modify neural architecture at runtime with PyTorch. But TF2 has Keras, which has an elegant way to describe neural network architecture in code
So, one thing to note, is that when I decide to attempt object segmentation again, the process will probably follow @nicolas.windt’s tutorial but with this file instead (for train- and test- and validate-). https://storage.googleapis.com/openimages/v5/train-annotations-object-segmentation.csv
For now, got the images, and will try train with the TF2 OD-API, starting with one of the models in the zoo: https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md
Roboflow seems to be a company with LabelImg and CVAT projects for annotating, but it sounds like it saves as VOC and then you can run a script to make a COCO JSON out of the VOC XML.
Going to try work around using Roboflow though, and save directly to COCO somehow. They have “pre-processing” (resize) and “augmentation” (flip the pictures around every which way to generate more data).
Ok it turns out version 2 is a lot better than version 3:
via-master/via-2.x.y/src/index.html
Working pretty well. Annotation is a bit confusing.
After watching a youtube video, the trick is to name an attribute, like ‘type’, and then add options for ‘egg’ and ‘chicken’ and then select as a dropdown. Then you can set the type attributes by clicking on the shape and selecting ‘egg’ or ‘chicken’
here’s a paper on making like a photoshop style magic selector https://arxiv.org/pdf/1903.10830.pdf for human annotators
Also found “Open Labeler” https://github.com/Cartucho/OpenLabeling which looks pretty good.
AdelaiDet is an open source toolbox for multiple instance-level recognition tasks on top of Detectron2. All instance-level recognition works from our group are open-sourced here.
To date, AdelaiDet implements the following algorithms:
Elon Musk now has a robot that can do neocortex circuit grafts with 1000 electrodes, but Yukiyasu Kamitani has been trying to read the brain without invasive surgery, for a while now.
“In their Sónar+D Talk, neuroscientist Yukiuasu Kamitani and multidisciplinary artist Daito Manabe explain the processes behind their groundbreaking collaborative show Dissonant Imaginary, whereby AI is used to decode brain visualisation processes.”
He shows pictures to people, while they’re in an fMRI machine, measures the magnetic waves, stores something like a pixel map, and tries to recreate the images https://www.youtube.com/watch?v=pV-PX1UNXmo
(Functional magnetic resonance imaging or functional MRI (fMRI) measures brain activity by detecting changes associated with blood flow. This technique relies on the fact that cerebral blood flow and neuronal activation are coupled.) – https://en.wikipedia.org/wiki/Functional_magnetic_resonance_imaging
Good wiki page. When reading AI/ML articles and papers, pseudocode and math is often written using greek symbols. Like Σ (sigma) for summing. Like, θ (theta) is for polar coordinates, or to refer to the ‘policy’ of an agent. STEM students osmose the many meanings of the symbols over years of study.
When you learn Calculus, you first learn about differentiation, and Newton’s method, which is when you keep taking the derivative, and drawing a line. https://en.wikipedia.org/wiki/Newton’s_method
SGD is kinda like that but with a bit more statistics involved. Adam is using the second derivative too and keeping track of averages, etc., making it a bit more sophisticated than Newton’s method.
Finding where a derivative is zero, is how you find local optima and minima, and so SGD can lead you towards solutions. (Evolutionary algorithms offer a less direct path towards solutions, typically with a random variable element. For some problems, lower learning rates help avoid local optima traps (like the robot falling over face first, because of a reward for going forward).