Dear all,
sometimes I want to do inference on my laptop - and the gpu cannot handle the memory load of the models trained on HPC clusters. When I do I am using [ctx = mx.cpu()] but with this configuration I do not see all threads being used. Is there something I can do so as mxnet to take full advantage of all 8 threads I have in my laptop?
Thank you very much for your time.
Hey @feevos,
The first thing you want to do is to make sure you are using mxnet-mkl so that you are taking advantage of the parallelization offered by mkldnn.
pip install mxnet-mkl
You can read more on this medium post: https://medium.com/apache-mxnet/accelerating-deep-learning-on-cpu-with-intel-mkl-dnn-a9b294fb0b9
From the article they suggest setting these env variables to get the maximum performance:
export KMP_AFFINITY=granularity=fine,compact,1,0
export vCPUs=`cat /proc/cpuinfo | grep processor | wc -l`
export OMP_NUM_THREADS=$((vCPUs / 2))
If the problem persists, try with:
export OMP_NUM_THREADS=`cat /proc/cpuinfo | grep processor | wc -l`
3 Likes
Thank you very much @ThomasDelteil - indeed by changing to mxnet-cu90mkl I got a boost, and now I see 4 threads being used (just like what the authors suggest - for my 8 thread laptop).