Invoking CUDA code from MXNET without modifying the source code

I have a shared object with a bunch of CUDA functions operating on plain contiguous arrays. I am willing to use this code as is to introduce a set of custom operators in MXNET. The guide on adding new operators (https://mxnet.incubator.apache.org/faq/new_op.html) offers 3 choises:

  • Plain python custom op
  • NVRTC-based custom op
  • Modifying MXNET source code to add a new operator.
    None of the options above works for me. I am looking for a way to obtain a pointer to ndarray data in GPU and invoke my shared object passing this pointer as a parameter.
  • Is this possible?
  • If it is not, what is the recommended method to achieve what I need?
    Thanks in advance!

Here is the link to the documentation for the run-time compilation API
https://mxnet.incubator.apache.org/api/python/rtc/rtc.html

“The RTC package contains tools for compiling and running CUDA code from python frontend. The compiled kernels can be used stand-alone or combined with autograd.Function or operator.CustomOpProp to support differentiation.”

Thanks, I looked into it before asking here. Unfortunately, my CUDA code has many dependencies, and as such, carries many inclde statements that RTC doesn’t like. Is there other way?