Check failed: error == cudaSuccess problem( proposal.cu )

Hi,
I have some problems,Can anyone help me please?
I want to use some specific API function in https://github.com/hpi-xnor/BMXNet
and I follow below commands to build and compile mxnet from source,I also can run specific API function correctly.
git clone --recursive https://github.com/hpi-xnor/mxnet.git # remember to include the --recursive mkdir build/Release && cd build/Release
cmake ../../ make -j8
export LD_LIBRARY_PATH=/build/Release export PYTHONPATH=/python

Next,I want to combine with some detection model,so I follow some commands below,and I can run the demo.py correctly by CPU. However,if I use python demo.py --gpu 0,it met some problems below.
Any help would be appericated! Thanks a lot!

Best Regards,
PengWei

python demo.py --prefix final --epoch 0 --image myimage.jpg --gpu 0 --vis

Error Message:
python demo.py --prefix final --epoch 0 --image myimage.jpg --gpu 2 --vis
[20:24:19] /home/jacky4323/BMXNet_v1/mxnet/src/operator/././cudnn_algoreg-inl.h:107: Running performance tests to find the best convolution algorithm, this can take a while… (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
[20:24:32] /home/jacky4323/BMXNet_v1/mxnet/dmlc-core/include/dmlc/logging.h:308: [20:24:32] /home/jacky4323/BMXNet_v1/mxnet/src/operator/contrib/proposal.cu:495: Check failed: error == cudaSuccess (7 vs. 0) too many resources requested for launch

Stack trace returned 10 entries:
[bt] (0) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/…/…/build/Release/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fcda10eae9c]
[bt] (1) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/…/…/build/Release/libmxnet.so(ZN5mxnet2op13ProposalGPUOpIN7mshadow3gpuEE7ForwardERKNS_9OpContextERKSt6vectorINS_5TBlobESaIS9_EERKS8_INS_9OpReqTypeESaISE_EESD_SD+0x12b9) [0x7fcda3ee92c9]
[bt] (2) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/…/…/build/Release/libmxnet.so(ZN5mxnet2op13OperatorState7ForwardERKNS_9OpContextERKSt6vectorINS_5TBlobESaIS6_EERKS5_INS_9OpReqTypeESaISB_EESA+0x36d) [0x7fcda13564ed]
[bt] (3) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/…/…/build/Release/libmxnet.so(_ZN5mxnet4exec23StatefulComputeExecutor3RunENS_10RunContextEb+0x69) [0x7fcda125de69]
[bt] (4) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/…/…/build/Release/libmxnet.so(+0x992210) [0x7fcda1222210]
[bt] (5) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/…/…/build/Release/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x93) [0x7fcda1119a83]
[bt] (6) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/…/…/build/Release/libmxnet.so(_ZN5mxnet6engine23ThreadedEnginePerDevice9GPUWorkerILN4dmlc19ConcurrentQueueTypeE0EEEvNS_7ContextEbPNS1_17ThreadWorkerBlockIXT_EEESt10shared_ptrINS0_10ThreadPool11SimpleEventEE+0x10b) [0x7fcda112289b]
[bt] (7) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/…/…/build/Release/libmxnet.so(ZNSt17_Function_handlerIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEZZNS2_23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlS5_E_E9_M_invokeERKSt9_Any_dataOS5+0x63) [0x7fcda1122ac3]
[bt] (8) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/…/…/build/Release/libmxnet.so(_ZNSt6thread5_ImplISt12_Bind_simpleIFSt8functionIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEES8_EEE6_M_runEv+0x4a) [0x7fcda111c22a]
[bt] (9) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7fce1fc54c80]

[20:24:32] /home/jacky4323/BMXNet_v1/mxnet/dmlc-core/include/dmlc/logging.h:308: [20:24:32] /home/jacky4323/BMXNet_v1/mxnet/src/engine/./threaded_engine.h:359: [20:24:32] /home/jacky4323/BMXNet_v1/mxnet/src/operator/contrib/proposal.cu:495: Check failed: error == cudaSuccess (7 vs. 0) too many resources requested for launch

Stack trace returned 10 entries:
[bt] (0) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/…/…/build/Release/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fcda10eae9c]
[bt] (1) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/…/…/build/Release/libmxnet.so(ZN5mxnet2op13ProposalGPUOpIN7mshadow3gpuEE7ForwardERKNS_9OpContextERKSt6vectorINS_5TBlobESaIS9_EERKS8_INS_9OpReqTypeESaISE_EESD_SD+0x12b9) [0x7fcda3ee92c9]
[bt] (2) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/…/…/build/Release/libmxnet.so(ZN5mxnet2op13OperatorState7ForwardERKNS_9OpContextERKSt6vectorINS_5TBlobESaIS6_EERKS5_INS_9OpReqTypeESaISB_EESA+0x36d) [0x7fcda13564ed]
[bt] (3) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/…/…/build/Release/libmxnet.so(_ZN5mxnet4exec23StatefulComputeExecutor3RunENS_10RunContextEb+0x69) [0x7fcda125de69]
[bt] (4) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/…/…/build/Release/libmxnet.so(+0x992210) [0x7fcda1222210]
[bt] (5) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/…/…/build/Release/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x93) [0x7fcda1119a83]
[bt] (6) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/…/…/build/Release/libmxnet.so(_ZN5mxnet6engine23ThreadedEnginePerDevice9GPUWorkerILN4dmlc19ConcurrentQueueTypeE0EEEvNS_7ContextEbPNS1_17ThreadWorkerBlockIXT_EEESt10shared_ptrINS0_10ThreadPool11SimpleEventEE+0x10b) [0x7fcda112289b]
[bt] (7) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/…/…/build/Release/libmxnet.so(ZNSt17_Function_handlerIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEZZNS2_23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlS5_E_E9_M_invokeERKSt9_Any_dataOS5+0x63) [0x7fcda1122ac3]
[bt] (8) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/…/…/build/Release/libmxnet.so(_ZNSt6thread5_ImplISt12_Bind_simpleIFSt8functionIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEES8_EEE6_M_runEv+0x4a) [0x7fcda111c22a]
[bt] (9) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7fce1fc54c80]

A fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.

Stack trace returned 8 entries:
[bt] (0) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/…/…/build/Release/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fcda10eae9c]
[bt] (1) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/…/…/build/Release/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x36b) [0x7fcda1119d5b]
[bt] (2) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/…/…/build/Release/libmxnet.so(_ZN5mxnet6engine23ThreadedEnginePerDevice9GPUWorkerILN4dmlc19ConcurrentQueueTypeE0EEEvNS_7ContextEbPNS1_17ThreadWorkerBlockIXT_EEESt10shared_ptrINS0_10ThreadPool11SimpleEventEE+0x10b) [0x7fcda112289b]
[bt] (3) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/…/…/build/Release/libmxnet.so(ZNSt17_Function_handlerIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEZZNS2_23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlS5_E_E9_M_invokeERKSt9_Any_dataOS5+0x63) [0x7fcda1122ac3]
[bt] (4) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/…/…/build/Release/libmxnet.so(_ZNSt6thread5_ImplISt12_Bind_simpleIFSt8functionIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEES8_EEE6_M_runEv+0x4a) [0x7fcda111c22a]
[bt] (5) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7fce1fc54c80]
[bt] (6) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7fce2d40a6ba]
[bt] (7) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fce2d1403dd]

terminate called after throwing an instance of 'dmlc::Error’
what(): [20:24:32] /home/jacky4323/BMXNet_v1/mxnet/src/engine/./threaded_engine.h:359: [20:24:32] /home/jacky4323/BMXNet_v1/mxnet/src/operator/contrib/proposal.cu:495: Check failed: error == cudaSuccess (7 vs. 0) too many resources requested for launch

Stack trace returned 10 entries:
[bt] (0) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/…/…/build/Release/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fcda10eae9c]
[bt] (1) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/…/…/build/Release/libmxnet.so(ZN5mxnet2op13ProposalGPUOpIN7mshadow3gpuEE7ForwardERKNS_9OpContextERKSt6vectorINS_5TBlobESaIS9_EERKS8_INS_9OpReqTypeESaISE_EESD_SD+0x12b9) [0x7fcda3ee92c9]
[bt] (2) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/…/…/build/Release/libmxnet.so(ZN5mxnet2op13OperatorState7ForwardERKNS_9OpContextERKSt6vectorINS_5TBlobESaIS6_EERKS5_INS_9OpReqTypeESaISB_EESA+0x36d) [0x7fcda13564ed]
[bt] (3) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/…/…/build/Release/libmxnet.so(_ZN5mxnet4exec23StatefulComputeExecutor3RunENS_10RunContextEb+0x69) [0x7fcda125de69]
[bt] (4) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/…/…/build/Release/libmxnet.so(+0x992210) [0x7fcda1222210]
[bt] (5) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/…/…/build/Release/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x93) [0x7fcda1119a83]
[bt] (6) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/…/…/build/Release/libmxnet.so(_ZN5mxnet6engine23ThreadedEnginePerDevice9GPUWorkerILN4dmlc19ConcurrentQueueTypeE0EEEvNS_7ContextEbPNS1_17ThreadWorkerBlockIXT_EEESt10shared_ptrINS0_10ThreadPool11SimpleEventEE+0x10b) [0x7fcda112289b]
[bt] (7) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/…/…/build/Release/libmxnet.so(ZNSt17_Function_handlerIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEZZNS2_23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlS5_E_E9_M_invokeERKSt9_Any_dataOS5+0x63) [0x7fcda1122ac3]
[bt] (8) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/…/…/build/Release/libmxnet.so(_ZNSt6thread5_ImplISt12_Bind_simpleIFSt8functionIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEES8_EEE6_M_runEv+0x4a) [0x7fcda111c22a]
[bt] (9) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7fce1fc54c80]

A fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.

Stack trace returned 8 entries:
[bt] (0) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/…/…/build/Release/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fcda10eae9c]
[bt] (1) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/…/…/build/Release/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x36b) [0x7fcda1119d5b]
[bt] (2) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/…/…/build/Release/libmxnet.so(_ZN5mxnet6engine23ThreadedEnginePerDevice9GPUWorkerILN4dmlc19ConcurrentQueueTypeE0EEEvNS_7ContextEbPNS1_17ThreadWorkerBlockIXT_EEESt10shared_ptrINS0_10ThreadPool11SimpleEventEE+0x10b) [0x7fcda112289b]
[bt] (3) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/…/…/build/Release/libmxnet.so(ZNSt17_Function_handlerIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEZZNS2_23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlS5_E_E9_M_invokeERKSt9_Any_dataOS5+0x63) [0x7fcda1122ac3]
[bt] (4) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/…/…/build/Release/libmxnet.so(_ZNSt6thread5_ImplISt12_Bind_simpleIFSt8functionIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEES8_EEE6_M_runEv+0x4a) [0x7fcda111c22a]
[bt] (5) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7fce1fc54c80]
[bt] (6) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7fce2d40a6ba]
[bt] (7) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fce2d1403dd]

I see that this is a forked repo. Are you also able to reproduce this issue with MXNet repo ??

Looks like kernel in proposal operator failed due to too large workspace size. Maybe your input/batchsize is too big?

what GPU are you using?

Also, please paste in the nvcc command for one of the build compiles, showing the arg list. My thought is to have your rebuild mxnet with the Makefile approach, ensuring your GPU’s arch is listed in KNOWN_CUDA_ARCHS.

We use GeForce GTX TITAN X

Sorry , as I’m not familiar with these things(nvcc,cuda) and I’m a newbie to Deep Learning , I just follow the command and use cmake to build mxnet. the CMakeList file I saw some commands related to nvcc are shown below

if(USE_CUDA)
if(FIRST_CUDA)
mshadow_select_nvcc_arch_flags(NVCC_FLAGS_ARCH)
string(REPLACE “;” " " NVCC_FLAGS_ARCH “{NVCC_FLAGS_ARCH}") set(CMAKE_CUDA_FLAGS "{NVCC_FLAGS_ARCH}”)
set(CMAKE_CUDA_FLAGS_RELEASE "{NVCC_FLAGS_ARCH} -use_fast_math") list(APPEND mxnet_LINKER_LIBS nvrtc cuda cublas cufft cusolver curand) list(APPEND SOURCE {CUDA})
add_definitions(-DMXNET_USE_CUDA=1)

So GTX Titan X is a Maxwell generation GPU, arch=52. Before we go any further, we should make sure when you build mxnet that the NVIDIA cuda compiler (nvcc) is invoked with the args: “-gencode_arch=52,code=sm_52”. There’s a lot going on in the FirstClassLangCuda.cmake file, so the most trustworthy thing is to look at the build log for invocations of nvcc. Can you grab one nvcc command and paste a snippet from that log? For example, my Makefile build invokes nvcc as in:

/usr/local/cuda/bin/nvcc -Werror cross-execution-space-call -std=c++11 -Xcompiler -D_FORCE_INLINES -O3 -ccbin g++ -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 --fatbin-options -compress-all -Xcompiler “-DMSHADOW_FORCE_STREAM -Wall -Wsign-compare -g -Werror -O3 -DNDEBUG=1 -I/home/dcarter/mxnet_dev/dgx/mxnet/mshadow/ -I/home/dcarter/mxnet_dev/dgx/mxnet/dmlc-core/include -fPIC -I/home/dcarter/mxnet_dev/dgx/mxnet/nnvm/include -I/home/dcarter/mxnet_dev/dgx/mxnet/dlpack/include -Iinclude -funroll-loops -Wno-unused-parameter -Wno-unknown-pragmas -Wno-unused-local-typedefs -msse3 -I/usr/local/cuda/include -DMSHADOW_USE_CBLAS=1 -DMSHADOW_USE_MKL=0 -DMSHADOW_RABIT_PS=0 -DMSHADOW_DIST_PS=0 -DMSHADOW_USE_PASCAL=0 -DMXNET_USE_PROFILER=1 -DMXNET_USE_OPENCV=1 -I/usr/include/opencv -fopenmp -DMXNET_USE_OPERATOR_TUNING=1 -DMXNET_USE_LAPACK -DMSHADOW_USE_CUDNN=1 -I/home/dcarter/mxnet_dev/dgx/mxnet/3rdparty/cub -DMXNET_ENABLE_CUDA_RTC=1 -DMXNET_USE_NCCL=0 -DMXNET_USE_NVTX=0 -DMXNET_USE_LIBJPEG_TURBO=0” -M -MT build/src/operator/nn/batch_norm_gpu.o src/operator/nn/batch_norm.cu >build/src/operator/nn/batch_norm_gpu.d

Hi,

thanks for help!!
I use cmake not using make ,so I can’t see some information you mentioned about
I got some messages like as below

I don’t use cmake, but I think it’s easy to get a full command output. google cmake verbose output

Hi,

thanks a lot!
I can print the output log now ,
it’s the same as you say
gencode_arch=52,code=sm_52