Im2rec.py I surrender! Generated list is incorrect

gjjs · October 15, 2018, 6:48am

Hi all,

I’m using Amazon Sagemaker to build my first model using the single shot multibox detector. I’ve run through the example which uses the COCO dataset and everything worked fine. I’m now trying to train using my own images.

I have 600 images of dogs, and for my first ever model I just want to identify if there is a dog in the picture, that’s it. I am trying to use im2rec.py to create a recordio dataset from my 600 jpg images. All the jpg images are in a single directory. I run the following command to generate the list files:

python im2rec.py --num-thread 8 --list --recursive --test-ratio=0.3 --train-ratio=0.7 collie_lst ~/images/combined/

That generates 2 files, collie_lst_test.lst and collie_lst_train.lst.

When I look at the files however they do not look correct. Here are the first few lines:

571 0.000000 626.jpg
63 0.000000 158.jpg
288 0.000000 365.jpg
473 0.000000 533.jpg
614 0.000000 669.jpg
249 0.000000 329.jpg

My understanding is that there should be a lot more data for each image, but no matter what I try, this is all I get. I’ve been trying different permutations of the command for hours with no luck, if anyone can help me understand what I am doing wring I would be eternally grateful!

Thank you.

olivcruche · October 15, 2018, 4:49pm

Hi, this looks correct? col 1 is the image index, col 2 is the class (0 or 1), col 3 is picture path. What makes you think it’s wrong? note:

If you train an image classifier to detect presence of dogs, that’s at least 2 classes (dog/not dog) so if your whole dataset is only dogs the model be able to learn what non-dog object look like
If you want to do object detection you need to provide localization metadata to your training data

gjjs · October 15, 2018, 5:10pm

Thank you for the reply.

When I try to run training in Sagemaker I receive the following error:

“Not enough label packed in img_list or rec file”

When I search for the cause of the error message I get a link to a Sagemaker github issue where someone else hit the same problem:

github.com/aws/amazon-sagemaker-examples

Not enough label packed in img_list or rec file.

opened 05:13PM - 25 Jul 18 UTC

closed 08:07PM - 17 Sep 18 UTC

murtuza07

I'm getting this error while I'm trying to train using object_detection_recordio…_format.ipynb Here's the error log: Docker entrypoint called with argument(s): train [07/25/2018 14:57:59 INFO 139826690209600] Reading default configuration from /opt/amazon/lib/python2.7/site-packages/algorithm/default-input.json: {u'lr_scheduler_step': u'', u'weight_decay': u'0.0005', u'optimizer': u'sgd', u'_tuning_objective_metric': u'', u'base_network': u'vgg-16', u'freeze_layer_pattern': u'', u'use_pretrained_model': u'0', u'_kvstore': u'device', u'label_width': u'350', u'kv_store': u'device', u'epochs': u'30', u'nms_threshold': u'0.45', u'momentum': u'0.9', u'overlap_threshold': u'0.5', u'lr_scheduler_factor': u'0.1', u'image_shape': u'300', u'_num_kv_servers': u'auto', u'mini_batch_size': u'32', u'learning_rate': u'0.001', u'num_classes': u'', u'num_training_samples': u''} [07/25/2018 14:57:59 INFO 139826690209600] Reading provided configuration from /opt/ml/input/config/hyperparameters.json: {u'lr_scheduler_step': u'3,6', u'weight_decay': u'0.0005', u'mini_batch_size': u'32', u'optimizer': u'sgd', u'base_network': u'resnet-50', u'learning_rate': u'0.001', u'use_pretrained_model': u'0', u'label_width': u'350', u'epochs': u'20', u'overlap_threshold': u'0.5', u'num_training_samples': u'924', u'num_classes': u'10', u'nms_threshold': u'0.45', u'image_shape': u'224', u'momentum': u'0.9', u'lr_scheduler_factor': u'0.1'} [07/25/2018 14:57:59 INFO 139826690209600] Final configuration: {u'label_width': u'350', u'epochs': u'20', u'overlap_threshold': u'0.5', u'lr_scheduler_factor': u'0.1', u'_num_kv_servers': u'auto', u'weight_decay': u'0.0005', u'mini_batch_size': u'32', u'use_pretrained_model': u'0', u'freeze_layer_pattern': u'', u'lr_scheduler_step': u'3,6', u'momentum': u'0.9', u'optimizer': u'sgd', u'_tuning_objective_metric': u'', u'learning_rate': u'0.001', u'kv_store': u'device', u'nms_threshold': u'0.45', u'num_classes': u'10', u'base_network': u'resnet-50', u'num_training_samples': u'924', u'_kvstore': u'device', u'image_shape': u'224'} [07/25/2018 14:57:59 INFO 139826690209600] Using default worker. [07/25/2018 14:57:59 INFO 139826690209600] Loaded iterator creator application/x-image for content type ('application/x-image', '1.0') [07/25/2018 14:57:59 INFO 139826690209600] Loaded iterator creator application/x-recordio for content type ('application/x-recordio', '1.0') [07/25/2018 14:57:59 INFO 139826690209600] Loaded iterator creator image/png for content type ('image/png', '1.0') [07/25/2018 14:57:59 INFO 139826690209600] Loaded iterator creator image/jpeg for content type ('image/jpeg', '1.0') [07/25/2018 14:57:59 WARNING 139826690209600] Training images are resized to image shape (3, 224, 224) [14:57:59] /opt/brazil-pkg-cache/packages/AIAlgorithmsMXNet/AIAlgorithmsMXNet-1.1.x.200530.0/RHEL5_64/generic-flavor/src/src/io/iter_image_det_recordio.cc:281: ImageDetRecordIOParser: /opt/ml/input/data/train/mydata_train.rec, use 7 threads for decoding.. Algorithm Error: Internal Server Error [14:57:59] /opt/brazil-pkg-cache/packages/AIAlgorithmsMXNet/AIAlgorithmsMXNet-1.1.x.200530.0/RHEL5_64/generic-flavor/src/src/io/iter_image_det_recordio.cc:315: Not enough label packed in img_list or rec file. Stack trace returned 9 entries: [bt] (0) /opt/amazon/lib/libaialgsdataiter.so(dmlc::StackTrace()+0x3d) [0x7f2bee78d46d] [bt] (1) /opt/amazon/lib/libaialgsdataiter.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x1a) [0x7f2bee78d70a] [bt] (2) /opt/amazon/lib/libmxnet.so(+0x17bba60) [0x7f2be37a5a60] [bt] (3) /opt/amazon/lib/libiomp5.so(__kmp_invoke_microtask+0x93) [0x7f2bd819aac3] [bt] (4) /opt/amazon/lib/libiomp5.so(+0x84257) [0x7f2bd8169257] [bt] (5) /opt/amazon/lib/libiomp5.so(+0x838d5) [0x7f2bd81688d5] [bt] (6) /opt/amazon/lib/libiomp5.so(+0xb5fa4) [0x7f2bd819afa4] [bt] (7) /lib64/libpthread.so.0(+0x7dc5) [0x7f2beff08dc5] [bt] (8) /lib64/libc.so.6(clone+0x6d) [0x7f2bef3056ed]

The post suggests the error is:

According to the log you posted, the input RecordIO file does not contain enough annotations for training the object detection algorithm. Before you convert the images into RecordIO format, please make sure the .lst file you generated contains all the annotation information. The annotation information for each object is represented as [class_index, xmin, ymin, xmax, ymax].

So reading that I was assuming I need to have class_index, xmin, ymin, xmax, ymax for each image.

Does that make sense? Sorry, this is a pretty steep learning curve

olivcruche · October 15, 2018, 5:45pm

yes, basically to train an object detection algorithm your data needs to contain both classification (which object?) and localisation (where?) information. If you are creating your own dataset you need to have bounding box information (xmin, ymin, xmax, ymax) to indentify where your objects are.
Those pages may help:
https://gluon-cv.mxnet.io/build/examples_datasets/detection_custom.html

gjjs · October 15, 2018, 6:04pm

OK, great, thanks again!

I’ll read the article in full, but basically I need to go through all 600 images manually and add a bounding box? This could take a while

Topic		Replies	Views
Image list file format reqd for Object Detection Dataset	0	668	May 11, 2018
Documentation for Im2Rec.py? Discussion	3	1352	October 28, 2018
Help with im2rec (again!)	1	556	May 14, 2019
Read images with rec format	1	488	November 9, 2018
RecordIO generation with image labels Discussion	4	1196	May 2, 2020

Im2rec.py I surrender! Generated list is incorrect

Related Topics