I was working with the resnet50 model from https://github.com/deepinsight/insightface – I noticed the following line take almost 8-9 minutes to load in a GPU, cuda8.0 with python2.7
loading ../models/model-r50-am-lfw/model 0
[17:51:46] src/nnvm/legacy_json_util.cc:209: Loading symbol saved by previous version v0.12.1. Attempting to upgrade...
[17:51:46] src/nnvm/legacy_json_util.cc:217: Symbol successfully upgraded!
[18:00:38] src/nnvm/legacy_json_util.cc:209: Loading symbol saved by previous version v0.8.0. Attempting to upgrade...
[18:00:38] src/nnvm/legacy_json_util.cc:217: Symbol successfully upgraded!
/usr/local/lib/python2.7/dist-packages/mxnet/model.py:928: DeprecationWarning: mxnet.model.FeedForward has been deprecated. Please use mxnet.mod.Module instead.
**kwargs)
[18:00:38] src/nnvm/legacy_json_util.cc:209: Loading symbol saved by previous version v0.8.0. Attempting to upgrade...
[18:00:38] src/nnvm/legacy_json_util.cc:217: Symbol successfully upgraded!
[18:00:38] src/nnvm/legacy_json_util.cc:209: Loading symbol saved by previous version v0.8.0. Attempting to upgrade...
[18:00:38] src/nnvm/legacy_json_util.cc:217: Symbol successfully upgraded!
[18:00:38] src/nnvm/legacy_json_util.cc:209: Loading symbol saved by previous version v0.8.0. Attempting to upgrade...
[18:00:38] src/nnvm/legacy_json_util.cc:217: Symbol successfully upgraded!
If you’re loading the image for the first time and nothing is restored, you’re always going to pay that expensive JIT cost. I’m not sure how to handle it on AWS, but you basically want to make sure your cuda cache is retained between instance runs. I had similar issue when using a docker with mxnet which was resolved by ensuring the cached carried across docker runs by (in my case) simply mapping the cache directory to somewhere on host so it was permanent. You can find that thread here:
To rule out cuda caching as a source of the problem, so two quick questions. (1) Does this happen repeatedly on the same machine (or in the same container if you’re using docker)? (2) Would you be able to run a cuobjdump on the libmxnet.so file that is installed after you do a pip install and then link to that output somehow? (Maybe post it on gist.github.com).