Here I changed his kv_store type to “device”, but running eight GPUs is not nearly twice as fast as running four GPUs. I wonder why?
4 GPU run time
Epoch 0: Test_acc 0.528500 time 24.068163
Epoch 1: Test_acc 0.600200 time 18.431711
Epoch 2: Test_acc 0.622900 time 19.140585
Epoch 3: Test_acc 0.637000 time 18.778502
Epoch 4: Test_acc 0.670600 time 18.383272
8 GPU run time
Epoch 0: Test_acc 0.515800 time 22.551225
Epoch 1: Test_acc 0.574200 time 19.231086
Epoch 2: Test_acc 0.603300 time 16.836740
Epoch 3: Test_acc 0.557800 time 18.368619
Epoch 4: Test_acc 0.629300 time 17.656158
ps:
I now want to modify the all_reduce part of mxnet to do experiments.
Can we distribute the data evenly to 8 GPUs in advance, and then at each iteration, the GPU randomly fetches data from its own data set.
How should this implement the data sampler?
If I want to modify the model Parameter directly, will this implementation be slow? Is there any elegant implementation in Mxnet?
for ctx_param in param.list_data():
ctx_param[:] = ctx_param[:]/ self.worker_num
You need to increase the batch-size by the number of GPU you are using, the batch-size is the total batch-size across your GPUs not per GPU, hence the lack of speed up, actually small slow down, because you are still doing the same number of iterations with your 2 GPUs than with your 8 GPUs. For example to compare head to head with your previous 2 GPUs run, try running with with --batch-size 512 and let me know how that goes. (PS if you are doing this actual training, don’t forget to increase your learning rate as well, your updates will be more confident since it will aggregate gradients from more example so you can increase your learning rate to learn faster)
Thanks.
Increasing the batch_size has an acceleration effect on the second sample, but the first one has already increased the batch_size.Something is wrong but I don’t know what?
Sorry.
Increasing batch_size has an acceleration effect on the second example train_cifar10.py ( from /example/image-classification).
But the first example cifar10_dist.py has no effect( from /example/distributed_training).