It's strange.C++ predicts much more slowly than python predicts

Hi @DarkWings,
I know you posted this a while ago.
How are you measuring the runtime in C++, what batch-size do you use and are you using CPU or GPU?
For me the runtime was similar so far on MXNET Intel MKL, CPU, batch-size 1.

Related topic: