The speedup of gradient compression seems not significantly for ResNet training

shuo-ouyang · March 27, 2020, 10:10am

Hi, I have implemented a similar 1bit gradient compression algorithm in MXNet. However, when I train resnet110 on CIFAR-10 dataset to compare my implementation to 2bit compression and no compression algorithms, I found that the speedup of gradient quantization for ResNet training seems not significantly. The training command and logs are shown as follows, and I deployed training tasks across four nodes (each node is equipped four K80 GPUs), in which one parameter server and three workers. Is there any incorrect setup in my training process?

ps: I use the example code in dictionaryexample/image-classification
mxnet version: 1.4.0
cuda: 8.0

training command:

python ../../tools/launch.py --launcher ssh -H hosts -s 1 -n 3 python train_cifar10.py --gc-type 2bit --gc-threshold 1 --kv-store dist_sync --num-epochs 200 --batch-size 128 --lr-step-epochs 100,150 --wd 0.0001 --lr 0.1 --lr-factor 0.1 --network resnet --gpus 0,1,2,3

Training Result

100th epoch:

	No Quantization	2bit Quantization	1bit Quantization
time cost (second)	19.27	19.777	18.545
validation accuracy	0.89122	0.887921	0.885871

150th epoch:

	No Quantization	2bit Quantization	1bit Quantization
time cost (second)	18.73	22.357	20.339
validation accuracy	0.92758	0.929688	0.929109

200th epoch:

	No Quantization	2bit Quantization	1bit Quantization
time cost (second)	19.048	18.846	19.649
validation accuracy	0.929988	0.935397	0.937500

Topic		Replies	Views
Training gets slower with time Gluon	2	380	October 31, 2019
Huge performance decrease by quantization Performance	4	987	June 4, 2019
Training speed in MXNet is nearly 2.5x times slower than Pytorch	8	2976	January 20, 2019
Pre-requisites for dist training "linearity"? Gluon	5	458	December 8, 2018
Understanding MXNet multi-gpu performance Performance	7	1839	November 5, 2018

The speedup of gradient compression seems not significantly for ResNet training

Training Result

Related Topics