How is the support for quantization in MXNet? I am trying to find a better framework than TensorLite because its support is abysmal.
Are the quantization methods available here (https://github.com/apache/incubator-mxnet/tree/master/example/quantization) only for research and experimentation purpose? Or will it lead to actual speed increase when deployed in mobile? There is no time comparison made so I presume it is just for research and experimentation?
Pruning will lead to performance improvement on PC and mobile, quantization support for mobile performance improvement (non intel CPU or nvidia GPU) is not available at the moment as far as I know. However I would recommend looking at deep learning compiler like TVM https://tvm.ai/about for compiling your model to your specific platform (arm CPU for example).
@ThomasDelteil Thanks. I am actually only concerned with the performance on mobile. There is no point in optimizing performance for PC because compute is easily available.