An illegal memory access

i have used mxnet (1.6.0) for face recogniton, but accidently it reports an error after 2 epochs during normal training:

Traceback (most recent call last):
File "", line 455, in <module>
 File "", line 451, in main
 File "", line 445, in train_net
 File "/home/user1/recognition/", line 573, in fit
 File "/home/user1/recognition/", line 406, in update
 File "/home/user1/miniconda3/lib/python3.7/site-packages/mxnet/ndarray/", line 200, in waitall
 File "/home/user1/miniconda3/lib/python3.7/site-packages/mxnet/", line 255, in check_call
   raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [03:32:38] /home/ubuntu/mxnet-distro/mxnet-build/3rdparty/mshadow/mshadow/./stream_gpu-inl.h:62: Check failed: e == cudaSuccess: CUDA: an illegal memory access was encountered
Stack trace:
 [bt] (0) /home/user1/miniconda3/lib/python3.7/site-packages/mxnet/ [0x7f76131a51eb]
 [bt] (1) /home/user1/miniconda3/lib/python3.7/site-packages/mxnet/ [0x7f76162a3742]
 [bt] (2) /home/user1/miniconda3/lib/python3.7/site-packages/mxnet/ [0x7f76162d4515]
 [bt] (3) /home/user1/miniconda3/lib/python3.7/site-packages/mxnet/ [0x7f76162b06d1]
 [bt] (4) /home/user1/miniconda3/lib/python3.7/site-packages/mxnet/ [0x7f76162b3c10]
 [bt] (5) /home/user1/miniconda3/lib/python3.7/site-packages/mxnet/ [0x7f76162b3ea6]
 [bt] (6) /home/user1/miniconda3/lib/python3.7/site-packages/mxnet/ [0x7f76162aee84]
 [bt] (7) /home/user1/miniconda3/bin/../lib/ [0x7f76aca9d421]
 [bt] (8) /lib/x86_64-linux-gnu/ [0x7f76bb1f0609]

i haven’t got any clue to solve this error after googling, but only decrease my batch_size 400 to 360, and not sure whether it will encounter error again… still worried about that :frowning:

@Karl Do you have a repro script?