When using .asscalar(), error occured: mxnet_generic_kernel_ex ErrStr:invalid resource handle

coderjustin · September 28, 2019, 3:21am

There’s one line of code trying to convert ndarray to numpy scalar, something like:

cumulative_loss += nd.mean(L).asscalar()

L is a loss value, ndarray, shape:(1,), the value is 0.40386838 when the error raised.

error info:

Traceback (most recent call last):
  File "train.py", line 232, in <module>
    best_val_recall = train(int(cfg['train']['epochs']), context)
  File "train.py", line 201, in train
    cumulative_loss += nd.mean(L).asscalar()
  File "/home/huzhihao/anaconda3/envs/soutu/lib/python3.6/site-packages/mxnet/ndarray/ndarray.py", line 2014, in asscalar
    return self.asnumpy()[0]
  File "/home/huzhihao/anaconda3/envs/soutu/lib/python3.6/site-packages/mxnet/ndarray/ndarray.py", line 1996, in asnumpy
    ctypes.c_size_t(data.size)))
  File "/home/huzhihao/anaconda3/envs/soutu/lib/python3.6/site-packages/mxnet/base.py", line 253, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [11:02:29] /home/travis/build/dmlc/mxnet-distro/mxnet-build/3rdparty/mshadow/mshadow/././././cuda/tensor_gpu-inl.cuh:110: Check failed: err == cudaSuccess (33 vs. 0) : Name: MapPlanKernel ErrStr:invalid resource handle
Stack trace:
  [bt] (0) /home/huzhihao/anaconda3/envs/soutu/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x4d81bb) [0x7fc44ea0d1bb]
  [bt] (1) /home/huzhihao/anaconda3/envs/soutu/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x33caa71) [0x7fc4518ffa71]
  [bt] (2) /home/huzhihao/anaconda3/envs/soutu/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x360401e) [0x7fc451b3901e]
  [bt] (3) /home/huzhihao/anaconda3/envs/soutu/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x3607846) [0x7fc451b3c846]
  [bt] (4) /home/huzhihao/anaconda3/envs/soutu/lib/python3.6/site-packages/mxnet/libmxnet.so(mxnet::imperative::PushFCompute(std::function<void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&)> const&, nnvm::Op const*, nnvm::NodeAttrs const&, mxnet::Context const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::Resource, std::allocator<mxnet::Resource> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<unsigned int, std::allocator<unsigned int> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&)::{lambda(mxnet::RunContext)#1}::operator()(mxnet::RunContext) const+0x372) [0x7fc450f7bb42]
  [bt] (5) /home/huzhihao/anaconda3/envs/soutu/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2998179) [0x7fc450ecd179]
  [bt] (6) /home/huzhihao/anaconda3/envs/soutu/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x29a4ab1) [0x7fc450ed9ab1]
  [bt] (7) /home/huzhihao/anaconda3/envs/soutu/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x29a7f90) [0x7fc450edcf90]
  [bt] (8) /home/huzhihao/anaconda3/envs/soutu/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x29a8226) [0x7fc450edd226]

Since we cannot reimplement this problem by simply using a ndarray and then convert it to scalar, so I have to ask what kind of error would cause this problem in general?

system info:
mxnet version: mxnet-cu90 1.5.0b20190723
os: ubuntu 16.04
gpu: gtx 1080ti

coderjustin · September 28, 2019, 6:43am

ahhh, I change to another server, no error raised again… Maybe it’s driver’s problem.
Problem solved.

ThomasDelteil · October 23, 2019, 7:42am

Hi @coderjustin, could be a CUDA issue indeed. Thanks for posting your solution.

Topic		Replies	Views
Exception calling ndarray.asscalar() on the GPU	6	1604	November 6, 2018
Asnumpy() fails regularly when running MXNet on AWS Lambda	12	1529	November 17, 2017
Asscalar() or eq. asnumpy()[0] is particularly slow to execute	3	1209	May 25, 2018
Reshape error calling ctypes Gluon	3	665	June 21, 2018
TypeError while doing a MxNet tutorial Gluon	3	413	January 13, 2020

When using .asscalar(), error occured: mxnet_generic_kernel_ex ErrStr:invalid resource handle

Related Topics