Huge performance decrease by quantization

kice · December 21, 2018, 10:56pm

I use the code from https://github.com/apache/incubator-mxnet/pull/13715, and I have a huge performance decrease by doing quantization on my model.

Tested on Windows 10 with CUDA 10 and cudnn7 on Titan X (Pascal), using pre-release build from pip mxnet-cu100.

I think we need to do more test on quantization or I just misunderstood the document.

BTW, it can be replicated by mxnet python package, and you might need to run mutli-times to have a reasonable outcome.

thomelane · January 7, 2019, 11:21pm

Hi @kice,

Many thanks for raising this issue. Could you provide a few more details about how you added quantisation? And to confirm, you’re seeing inference time for a single sample double when you add quantisation? What changed from when you had a x2 speedup with quantisation? Or does it have very high variance?

Cheers,

Thom

kice · January 9, 2019, 11:26pm

I did the quantisation by the offical example link: https://github.com/apache/incubator-mxnet/tree/master/example/quantization.

Yes, I got double speed up by a signle run; but it might just be the first time i ran all the resource was not loaded, and the quantization one had everything ready to go.

And by recent test, I had twice the run time for int8 quantization then fp32 model. If you need a model for testing, i can upload one for comparison, including orginial fp32 model and int8 quantized.

DuGGes · March 4, 2019, 1:46pm

Can you please share a quantized model for testing?
For reasons unknown to me, the quantized mobilenet model predicts 4 times longer than the standard model.

pengzhao-intel · June 4, 2019, 2:22pm

Most likely the HW is not supported for INT8 computation. You need to at least skylake CPU.

Some data in the blog

Topic		Replies	Views
How is the support for quantization in MXNet?	5	2126	April 17, 2019
No noticable speed improvement with higher compute capability Performance	6	685	April 16, 2019
Quantization questions Performance	9	1889	June 24, 2019
Puzzling performance issue Performance	0	365	February 9, 2020
Gluon implementation much slower than Symbolic Performance	9	1699	August 20, 2018

Huge performance decrease by quantization

Related Topics