Problem of Compile MxNet with Warp-CTC

Hi, I am trying to use MxNet’s deepspeech code to train my own model. According to the readme, I should build MxNet from source code with warp-ctc.
I followed the instruction and meet problem when I use ninja to build the project. Here is my system info:
OS: Ubuntu 16.04
CUDA: 10.0
GPU: Tesla P100
C++: 5.4.0
When I do the configuration and run ninja, it came up with a link error. Can anybody give me some advice? Thank you very much!

[2/2] Linking CXX shared library libmxnet.so
FAILED: : && /usr/bin/c++  -fPIC -mf16c -Wall -Wno-unknown-pragmas -Wno-sign-compare -O3 -msse3 -std=c++11 -mf16c -fopenmp -std=c++0x -mcmodel=large   -shared -Wl,-soname,libmxnet.so -o libmxnet.so CMakeFiles/mxnet.dir/dummy.c.o -L/usr/local/cuda-9.2/lib64 -Wl,--whole-archive libmxnet.a -Wl,--no-whole-archive libmxnet.a /usr/local/cuda-9.2/lib64/libnvToolsExt.so -lopenblas /usr/local/cuda-9.2/lib64/libcudart.so /usr/local/cuda-9.2/lib64/libcurand.so /usr/local/cuda-9.2/lib64/libcublas.so /usr/local/cuda-9.2/lib64/libcudart.so /usr/local/cuda-9.2/lib64/libcurand.so /usr/local/cuda-9.2/lib64/libcublas.so /usr/local/cuda-9.2/lib64/libcudnn.so -lrt /usr/lib/x86_64-linux-gnu/libopencv_highgui.so.2.4.9 /usr/lib/x86_64-linux-gnu/libopencv_imgproc.so.2.4.9 3rdparty/openmp/runtime/src/libomp.so -lpthread -llapack /usr/local/cuda-9.2/lib64/libcudnn.so -lcufft -lcusolver -lnvrtc -lcuda 3rdparty/dmlc-core/libdmlc.a -lrt -lpthread -llapack -lcufft -lcusolver -lnvrtc -lcuda /usr/lib/x86_64-linux-gnu/libopencv_core.so.2.4.9 -lpthread -ldl -lrt -Wl,-rpath,/usr/local/cuda-9.2/lib64:/home/v-hohua/incubator-mxnet/build/3rdparty/openmp/runtime/src: && :
/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crti.o: In function `_init':
(.init+0x7): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `__gmon_start__'
/usr/lib/gcc/x86_64-linux-gnu/5/crtbeginS.o: In function `deregister_tm_clones':
crtstuff.c:(.text+0x3): relocation truncated to fit: R_X86_64_PC32 against `.tm_clone_table'
crtstuff.c:(.text+0xa): relocation truncated to fit: R_X86_64_PC32 against symbol `__TMC_END__' defined in .nvFatBinSegment section in libmxnet.so
crtstuff.c:(.text+0x1e): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `_ITM_deregisterTMCloneTable'
/usr/lib/gcc/x86_64-linux-gnu/5/crtbeginS.o: In function `register_tm_clones':
crtstuff.c:(.text+0x43): relocation truncated to fit: R_X86_64_PC32 against `.tm_clone_table'
crtstuff.c:(.text+0x4a): relocation truncated to fit: R_X86_64_PC32 against symbol `__TMC_END__' defined in .nvFatBinSegment section in libmxnet.so
crtstuff.c:(.text+0x6b): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `_ITM_registerTMCloneTable'
/usr/lib/gcc/x86_64-linux-gnu/5/crtbeginS.o: In function `__do_global_dtors_aux':
crtstuff.c:(.text+0x92): relocation truncated to fit: R_X86_64_PC32 against `.bss'
crtstuff.c:(.text+0x9c): relocation truncated to fit: R_X86_64_GOTPCREL against symbol `__cxa_finalize@@GLIBC_2.2.5' defined in .text section in /lib/x86_64-linux-gnu/libc.so.6
crtstuff.c:(.text+0xaa): relocation truncated to fit: R_X86_64_PC32 against symbol `__dso_handle' defined in .data.rel.local section in /usr/lib/gcc/x86_64-linux-gnu/5/crtbeginS.o
crtstuff.c:(.text+0xbb): additional relocation overflows omitted from the output
libmxnet.so: PC-relative offset overflow in PLT entry for `_ZN5mxnet2op8mxnet_op6KernelINS0_9pick_gradILi3ELb0EEEN7mshadow3gpuEE6LaunchIJPdS9_PfiiNS5_5ShapeILi3EEESC_EEEvPNS5_6StreamIS6_EEiDpT_'
collect2: error: ld returned 1 exit status

I searched for some related information, It seems to because of the size of program. However, I have added the flag “-mcmodel=large”, it still has the same problem.

Did you follow these instructions @hhm ?