SegmentationFault on Process Exit With CUDA 11.0.3

I compiled MXNet from branch 1.6.x with non-standard CUDA.

I had to put

#define THRUST_IGNORE_CUB_VERSION_CHECK 1

in multiple /src/ directory files to silence thrust library errors (due to version mismatch with CUDA).

Now I (successfully) build python library. Training is fine. Now, when I load model from disk, I do

model.bind(...)
model.set_params(arg_params, aux_params)
...
model.predict(...)

and inference is fine again. But when process finished I get stacktrace:

Segmentation fault: 11


Segmentation fault: 11

Stack trace:
  [bt] (0) /home/emil/Downloads/sources/incubator-mxnet/python/mxnet/../../build/libmxnet.so(+0x18c3f39) [0x7fabd6dacf39]
  [bt] (1) /lib/x86_64-linux-gnu/libc.so.6(+0x46210) [0x7fac02bff210]
  [bt] (2) /usr/local/cuda-11.0/lib64/libcudnn_ops_infer.so.8(+0x15a7541) [0x7faaf9c89541]
  [bt] (3) /usr/local/cuda-11.0/lib64/libcudnn_ops_infer.so.8(+0x15c710f) [0x7faaf9ca910f]
  [bt] (4) /usr/local/cuda-11.0/lib64/libcudnn_ops_infer.so.8(cudnnDestroy+0x8f) [0x7faaf88de72f]
  [bt] (5) /home/emil/Downloads/sources/incubator-mxnet/python/mxnet/../../build/libmxnet.so(void mshadow::DeleteStream<mshadow::gpu>(mshadow::Stream<mshadow::gpu>*)+0x116) [0x7fabd6cb1c56]
  [bt] (6) /home/emil/Downloads/sources/incubator-mxnet/python/mxnet/../../build/libmxnet.so(void mxnet::engine::ThreadedEnginePerDevice::GPUWorker<(dmlc::ConcurrentQueueType)0>(mxnet::Context, bool, mxnet::engine::ThreadedEnginePerDevice::ThreadWorkerBlock<(dmlc::ConcurrentQueueType)0>*, std::shared_ptr<dmlc::ManualEvent> const&)+0x287) [0x7fabd6ccb007]
  [bt] (7) /home/emil/Downloads/sources/incubator-mxnet/python/mxnet/../../build/libmxnet.so(std::_Function_handler<void (std::shared_ptr<dmlc::ManualEvent>), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#4}::operator()() const::{lambda(std::shared_ptr<dmlc::ManualEvent>)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr<dmlc::ManualEvent>&&)+0x44) [0x7fabd6ccb3c4]
  [bt] (8) /home/emil/Downloads/sources/incubator-mxnet/python/mxnet/../../build/libmxnet.so(std::thread::_State_impl<std::thread::_Invoker<std::tuple<std::function<void (std::shared_ptr<dmlc::ManualEvent>)>, std::shared_ptr<dmlc::ManualEvent> > > >::_M_run()+0x45) [0x7fabd6cc6095]
Stack trace:
  [bt] (0) /home/emil/Downloads/sources/incubator-mxnet/python/mxnet/../../build/libmxnet.so(+0x18c3f39) [0x7fabd6dacf39]
  [bt] (1) /lib/x86_64-linux-gnu/libc.so.6(+0x46210) [0x7fac02bff210]
  [bt] (2) /usr/local/cuda-11.0/lib64/libcudnn_ops_infer.so.8(+0x15a7541) [0x7faaf9c89541]
  [bt] (3) /usr/local/cuda-11.0/lib64/libcudnn_ops_infer.so.8(+0x15c710f) [0x7faaf9ca910f]
  [bt] (4) /usr/local/cuda-11.0/lib64/libcudnn_ops_infer.so.8(cudnnDestroy+0x8f) [0x7faaf88de72f]
  [bt] (5) /home/emil/Downloads/sources/incubator-mxnet/python/mxnet/../../build/libmxnet.so(void mshadow::DeleteStream<mshadow::gpu>(mshadow::Stream<mshadow::gpu>*)+0x116) [0x7fabd6cb1c56]
  [bt] (6) /home/emil/Downloads/sources/incubator-mxnet/python/mxnet/../../build/libmxnet.so(void mxnet::engine::ThreadedEnginePerDevice::GPUWorker<(dmlc::ConcurrentQueueType)0>(mxnet::Context, bool, mxnet::engine::ThreadedEnginePerDevice::ThreadWorkerBlock<(dmlc::ConcurrentQueueType)0>*, std::shared_ptr<dmlc::ManualEvent> const&)+0x287) [0x7fabd6ccb007]
  [bt] (7) /home/emil/Downloads/sources/incubator-mxnet/python/mxnet/../../build/libmxnet.so(std::_Function_handler<void (std::shared_ptr<dmlc::ManualEvent>), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#4}::operator()() const::{lambda(std::shared_ptr<dmlc::ManualEvent>)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr<dmlc::ManualEvent>&&)+0x44) [0x7fabd6ccb3c4]
  [bt] (8) /home/emil/Downloads/sources/incubator-mxnet/python/mxnet/../../build/libmxnet.so(std::thread::_State_impl<std::thread::_Invoker<std::tuple<std::function<void (std::shared_ptr<dmlc::ManualEvent>)>, std::shared_ptr<dmlc::ManualEvent> > > >::_M_run()+0x45) [0x7fabd6cc6095]
Segmentation fault (core dumped)