Mxnet build source code errorL Leaving directory: mxnet/3rdparty/ps-lite

Thanks, I do it now. I am looking forward to the results.

It is the same problem. When the program meets “Leaving directory: /3rd/part/ps-lite”. it stops, no more information, no more errors.
The following is the log information:
https://drive.google.com/file/d/1H-HJfngypW2Ex-hcjMoFKsNMhbyjisPy/view?usp=sharing

Thanks.

Confirming

  1. You’re compiling a clean MXNet, that is no modifications made to it (you mentioned you’re working on some C++ code)
  2. After compilation, are there files in lib/* that have been generated? Like libmxnet.so

If the answer is “yes” and then “no”, then there is something up with the build with your configuration. We can create an issue. It would be helpful to have the make -j1 log output. The reason for -j1 is that j8 runs 8 threads/processes so an error may be much further up the log. If you run j1 then the error will be right before the issue. Apologies for asking for another build output, but this is the last time :wink:

Kind regards,
Vishaal

No problem, I am willing to try to find problems, thank you.

  1. There are some files in the ‘lib/’ folder, but there is no libmxnet.so; the files in the /lib/ folder are the following:
    engines-1.1 libffi.a libgomp.so liblzma.so libncurses.so.6.1 libprotobuf-lite.a libquadmath.so libstdc++.so.6 libtsan.so.0 tclooConfig.sh
    itcl4.1.1 libffi.la libgomp.so.1 liblzma.so.5 libncurses++w.a libprotobuf-lite.so libquadmath.so.0 libstdc++.so.6.0.25 libtsan.so.0.0.0 tdbc1.0.6
    libasan.so libffi.so libgomp.so.1.0.0 liblzma.so.5.2.4 libncursesw.a libprotobuf-lite.so.17 libquadmath.so.0.0.0 libtcl8.6.so libubsan.so tdbcmysql1.0.6
    libasan.so.5 libffi.so.6 libhistory.a libmenu.a libncursesw.so libprotobuf-lite.so.17.0.0 libreadline.a libtclstub8.6.a libubsan.so.1 tdbcodbc1.0.6
    libasan.so.5.0.0 libffi.so.6.0.4 libhistory.so libmenu.so libncursesw.so.6 libprotobuf.so libreadline.so libtinfo.a libubsan.so.1.0.0 tdbcpostgres1.0.6
    libatomic.so libform.a libhistory.so.7 libmenu.so.6 libncursesw.so.6.1 libprotobuf.so.17 libreadline.so.7 libtinfo.so libz.a terminfo
    libatomic.so.1 libform.so libhistory.so.7.0 libmenu.so.6.1 libpanel.a libprotobuf.so.17.0.0 libreadline.so.7.0 libtinfo.so.6 libz.so thread2.8.2
    libatomic.so.1.2.0 libform.so.6 libitm.so libmenuw.a libpanel.so libprotoc.a libsqlite3.a libtinfo.so.6.1 libz.so.1 tk8.6
    libcrypto.a libform.so.6.1 libitm.so.1 libmenuw.so libpanel.so.6 libprotoc.so libsqlite3.so libtinfow.a libz.so.1.2.11 tkConfig.sh
    libcrypto.so libformw.a libitm.so.1.0.0 libmenuw.so.6 libpanel.so.6.1 libprotoc.so.17 libsqlite3.so.0 libtinfow.so pkgconfig
    libcrypto.so.1.1 libformw.so liblsan.so libmenuw.so.6.1 libpanelw.a libprotoc.so.17.0.0 libsqlite3.so.0.8.6 libtinfow.so.6 python3.7
    libedit.a libformw.so.6 liblsan.so.0 libncurses.a libpanelw.so libpython3.7m.a libssl.a libtinfow.so.6.1 sqlite3.21.0
    libedit.so libformw.so.6.1 liblsan.so.0.0.0 libncurses++.a libpanelw.so.6 libpython3.7m.so libssl.so libtk8.6.so tcl8
    libedit.so.0 libgcc_s.so liblzma.a libncurses.so libpanelw.so.6.1 libpython3.7m.so.1 libssl.so.1.1 libtkstub8.6.a tcl8.6
    libedit.so.0.0.59 libgcc_s.so.1 liblzma.la libncurses.so.6 libprotobuf.a libpython3.7m.so.1.0 libstdc++.so libtsan.so tclConfig.sh

  2. I put the customized c++ layers in the src/operator and contrib/ folders

  3. I am compiling source code using make -j1

Let’s see what happened, thank you very much!

Thanks,

'make -j1 will help you more easily determine what the error is for your case, but if you do want to do a smoke test, be sure that your build is working without any modifications (purely clean branch). :slight_smile:

Vishaal

I get your point, I will do smoke test.
Let’s see what happen this time.
Thank you very much.

I used ‘make -j1’, it shows the following error:
/bin/sh: /usr/local/cuda/bin/nvcc: No such file or directory
make: *** [build/src/operator/nn/cudnn/cudnn_batch_norm_gpu.o] Error 127

Before compiling, I add some modules using the following commands:
module add Anaconda3/python-3.6
module add CUDA/9.1.85
module add cuDNN/7.0.5-CUDA-9.1.85
module add OpenBLAS/0.2.19-GCC-5.4.0-2.26-LAPACK-3.7.0

For the config.mk:
USE_OPENCV = 0
USE_BLAS = openblas
USE_CUDA = 1
USE_CUDA_PATH = /usr/local/cuda
USE_CUDNN = 1
USE_NCCL = 0
USE_DIST_KVSTORE = 1

Any advice about that?
Thanks

Great! nvcc should be found :slight_smile:

nvcc should be included with CUDA - I would recommend debugging your CUDA installation. You’ve installed 9.1, right? Is it installed in /usr/local/cuda or elsewhere? Are there files in /usr/local/cuda/bin/* ? Is nvcc in there? Can you run it if you type /usr/local/cuda/bin/nvcc? Maybe the CUDA installation in in a diff directory?

Vishaal

I am checking CUDA and NVCC setting.
Actually, this cluster is managed by other people, I do not have sudo right to check CUDA installation, let me send email to manager first.
Waiting for news, Thanks for your help.

My pleasure!

Vishaal

The problem has been solved, thank you very much!

Hey hdjsjyl,
How did you solve the problem?? I’m having the same error here
Thanks

Hi gabrielkoyama,
Please using “make -j1” to check what is the problem. I don’t remember the problem clearly. Thanks

Thanks for your reply!

I’m trying to install mxnet to use with FCIS, so i’m following these instructions:

git clone --recursive github. com/dmlc/mxnet.git
git checkout 998378a
git submodule init
git submodule update

cp -r FCIS_ROOT/fcis/operator_cxx/channel_operator* MXNET_ROOT/src/operator/contrib/

And then,

make -j1 with

USE_OPENCV=1
USE_BLAS=openblas
USE_CUDA=1
USE_CUDA_PATH=/usr/local/cuda
USE_CUDNN=1

Output:

src/operator/./cudnn_rnn-inl.h(435): error: argument of type “cudnnRNNDescriptor_t” is incompatible with parameter of type “cudnnHandle_t”
detected during:
instantiation of “void mxnet::op::CuDNNRNNOp::Forward(const mxnet::OpContext &, const std::vector<mxnet::TBlob, std::allocator< mxnet::TBlob>> &, const std::vector<mxnet::OpReqType, std::allocator< mxnet::OpReqType>> &, const std::vector<mxnet::TBlob, std::allocator< mxnet::TBlob>> &, const std::vector<mxnet::TBlob, std::allocator< mxnet::TBlob>> &) [with DType=float]”
(54): here
instantiation of “mxnet::op::CuDNNRNNOp::CuDNNRNNOp(mxnet::op::RNNParam) [with DType=float]”
src/operator/rnn.cu(20): here

8 errors detected in the compilation of “/tmp/tmpxft_00003b43_00000000-11_rnn.compute_61.cpp1.ii”.
Makefile:274: recipe for target ‘build/src/operator/rnn_gpu.o’ failed
make: *** [build/src/operator/rnn_gpu.o] Error 1

I’m using cuda 10.2 and cudnn 7.6.5, could be version error?

Thank you.

this made a lot of sense thanks