MxNet model taking a long time to load

I was working with the resnet50 model from https://github.com/deepinsight/insightface – I noticed the following line take almost 8-9 minutes to load in a GPU, cuda8.0 with python2.7

Command :

python test.py --model ../models/model-r50-am-lfw/model,0 --flip 1
model.bind(data_shapes=[('data', (1, 3, image_size[0], image_size[1]))])

Stack trace:

loading ../models/model-r50-am-lfw/model 0
[17:51:46] src/nnvm/legacy_json_util.cc:209: Loading symbol saved by previous version v0.12.1. Attempting to upgrade...
[17:51:46] src/nnvm/legacy_json_util.cc:217: Symbol successfully upgraded!
[18:00:38] src/nnvm/legacy_json_util.cc:209: Loading symbol saved by previous version v0.8.0. Attempting to upgrade...
[18:00:38] src/nnvm/legacy_json_util.cc:217: Symbol successfully upgraded!
/usr/local/lib/python2.7/dist-packages/mxnet/model.py:928: DeprecationWarning: mxnet.model.FeedForward has been deprecated. Please use mxnet.mod.Module instead.
  **kwargs)
[18:00:38] src/nnvm/legacy_json_util.cc:209: Loading symbol saved by previous version v0.8.0. Attempting to upgrade...
[18:00:38] src/nnvm/legacy_json_util.cc:217: Symbol successfully upgraded!
[18:00:38] src/nnvm/legacy_json_util.cc:209: Loading symbol saved by previous version v0.8.0. Attempting to upgrade...
[18:00:38] src/nnvm/legacy_json_util.cc:217: Symbol successfully upgraded!
[18:00:38] src/nnvm/legacy_json_util.cc:209: Loading symbol saved by previous version v0.8.0. Attempting to upgrade...
[18:00:38] src/nnvm/legacy_json_util.cc:217: Symbol successfully upgraded!

@domarps can you point to the exact file and line please?

Thanks!

Thanks @ThomasDelteil – I have updated the post. I suspect the issue is similar to the following issues:

  1. https://github.com/apache/incubator-mxnet/issues/1557
  2. https://github.com/apache/incubator-mxnet/issues/10016

I did not get a clear answer from either issue. My environment is an AWS p3.2x with Deep Learning Base AMI (Ubuntu) Version 6.0 (ami-ce3673b6).

Before running the model, I ran the command:

- pip install mxnet-cu80
- export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64

If you’re loading the image for the first time and nothing is restored, you’re always going to pay that expensive JIT cost. I’m not sure how to handle it on AWS, but you basically want to make sure your cuda cache is retained between instance runs. I had similar issue when using a docker with mxnet which was resolved by ensuring the cached carried across docker runs by (in my case) simply mapping the cache directory to somewhere on host so it was permanent. You can find that thread here:

To rule out cuda caching as a source of the problem, so two quick questions. (1) Does this happen repeatedly on the same machine (or in the same container if you’re using docker)? (2) Would you be able to run a cuobjdump on the libmxnet.so file that is installed after you do a pip install and then link to that output somehow? (Maybe post it on gist.github.com).