Deploy Sagemaker Trained Model locally?


I have trained a model on Sagemaker using the Object Detection Algorithm -

This has given me 3 files:

I can deploy an inference endpoint for this on Sagemaker without any issues. But I am not able to deploy the same model locally.

I am following the following blog -

But, any value that I provide for --label-name argument throws out an error. How would one go about deploying a Sagemaker trained model for Object Detection Algorithm locally? The final goal is to be able to deploy the inference endpoint to a Raspberry Pi.

Any help is appreciated. Feel free to let me know if you need more details.

Vinay Nadig

Can you elaborate on “any value that I provide for --label-name argument throws out an error”? A code snippet from what you tried and the error message you got will help.

Hi Indu,

Here is the script I am using(the same one in the blog) -
Here is my model_algo_1-symbol.json file -
Here is the command that I am trying to run:

python --label-name 'cls_prob' --img 'test.jpg' --prefix 'model_algo_1' --synset 'synset.txt' 

synset.txt contains just one word in a single line - ‘car’ which is the only object the model is trained to detect.

The error I am getting is this:

[02:49:20] src/nnvm/ Loading symbol saved by previous version v1.2.1. Attempting to upgrade...
[02:49:20] src/nnvm/ Symbol successfully upgraded!
/usr/local/lib/python2.7/dist-packages/mxnet/module/ UserWarning: You created Module with Module(..., label_names=['cls_prob']) but input with name 'cls_prob' is not found in symbol.list_arguments(). Did you mean one of:
/usr/local/lib/python2.7/dist-packages/mxnet/module/ UserWarning: Data provided by label_shapes don't match names specified by label_names ([] vs. ['cls_prob'])
Traceback (most recent call last):
  File "", line 105, in <module>
    mod = ImagenetModel(args.synset, args.prefix, label_names=[args.label_name], params_url=args.params_url, symbol_url=args.symbol_url, synset_url=args.synset_url)
  File "", line 45, in __init__
    self.mod.bind(for_training=False, data_shapes= input_shapes)
  File "/usr/local/lib/python2.7/dist-packages/mxnet/module/", line 429, in bind
  File "/usr/local/lib/python2.7/dist-packages/mxnet/module/", line 279, in __init__
    self.bind_exec(data_shapes, label_shapes, shared_group)
  File "/usr/local/lib/python2.7/dist-packages/mxnet/module/", line 375, in bind_exec
  File "/usr/local/lib/python2.7/dist-packages/mxnet/module/", line 662, in _bind_ith_exec
    shared_buffer=shared_data_arrays, **input_shapes)
  File "/usr/local/lib/python2.7/dist-packages/mxnet/symbol/", line 1528, in simple_bind
    raise RuntimeError(error_msg)
RuntimeError: simple_bind error. Arguments:
data: (1, 3, 224, 224)
Error in operator multi_feat_5_conv_3x3_conv: [02:49:20] src/operator/nn/ Check failed: dilated_ksize_y <= AddPad(dshape[2], param_.pad[0]) (3 vs. 2) kernel size exceed input

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet/ [0x7fc7321cefb4]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet/ [0x7fc7321cf391]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet/ [0x7fc7324b1d86]
[bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet/ [0x7fc7349c3fba]
[bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet/ [0x7fc7349c6964]
[bt] (5) /usr/local/lib/python2.7/dist-packages/mxnet/ [0x7fc7349b2efa]
[bt] (6) /usr/local/lib/python2.7/dist-packages/mxnet/ [0x7fc7349b3a34]
[bt] (7) /usr/local/lib/python2.7/dist-packages/mxnet/ [0x7fc73490ee48]
[bt] (8) /usr/lib/x86_64-linux-gnu/ [0x7fc741fe1e40]
[bt] (9) /usr/lib/x86_64-linux-gnu/ [0x7fc741fe18ab]

Let me know if you would like access to all 3 files(hyperparameters.json, model_algo_1-0000.params, model_algo_1-symbol.json) and I will share it with you.

@VishaalKapoor @thomelane

Hi @vinaynadig,

It looks like the symbol ‘upgrade’ is getting things in muddle here! It looks like the model was saved with a different version of MXNet <v1.2.1, and now you’re loading the model back with a version of MXNet >v1.2.1. Upgrade is required because there was a change in the model serialisation format, but it’s seems to change the model architecture too (at least the naming of symbols).

You have a symbol called cls_prob in model_algo_1-symbol.json but after the upgrade you have symbol called label. I can think of couple of things to try here:

  1. You could try changing --label-name 'label'
  2. You could downgrade to v1.2.1 if you want to keep the unchanged.
  3. Change to use load and load_params, instead of load_checkpoint.

Hi Thom,

Thanks for the pointers! I was able to get it work with the following changes:

  1. Strip the model of training layers as defined here -
  2. Downgrade mxnet to v1.2.1.
  3. Make sure that when running from mxnet-ssd repo, these arguments match the hyperparameters that were entered during training the model - ‘nms’
  4. In the, set the input_shapes value to (3, 300, 300). This is because sagemaker recommends either 300 or 512 as the value for image_shape hyperparameter and I had used 300 as the value during training.

With these changes, I am able to use the model generated by Sagemaker locally.

Glad you managed to get it working, and thanks for the follow up post!