Question about memory usage during using Multiple GPUs

Rahul_Ravu · January 26, 2018, 9:18pm

Hello,

This is more of a question about Gluon’s behavior while using multiple GPUs. I tried running one of the samples codes(http://gluon.mxnet.io/chapter07_distributed-learning/multiple-gpus-gluon.html) in the Gluon notebook on a p2.16xlarge box. Here’s the output of nvidia-smi with GPU_COUNT set to 8:

I don’t expect the memory utilization to be exactly equal across all the GPUs. The difference is greater if I increase the GPU_COUNT to 16. Now, I know the batch size also is multiplied by the GPU_COUNT but the initial batch is stored on the CPU. Is there any additional changes to code that need to be made to ensure a more even memory utilization.

I am actually trying to run a simple FC Network to compute Knowledge Base Embeddings using multiple GPUs. On the same box, the max GPU count I can set to is 4 and anything above fails with a OOM.

Thanks,
Rahul

bejjani · January 27, 2018, 6:23am

gpu(0) is used by default by the trainer to do both the gradient aggregation across all the devices and to also perform the parameter updates. That’s why you see a higher memory footprint on that gpu.
The default kvstore used when instantiating a gluon.Trainer is ‘device’ which corresponds to what I just explained. Alternative is ‘local’ which will cause both the aggregate and the updates to happen on CPU freeing some of that gpu memory at the expense of copying data out of gpu to cpu (slower than between gpus).

With the current model I am working on, I see gpu(0) getting ~2.5 more memory allocated than the rest and that’s probably because I use RMSProp that requires storing a running average of previous squared gradients. I assume you are using sgd or nesterov to have a smaller memory footprint relative to the other gpus.

Some more info here https://mxnet.incubator.apache.org/api/python/kvstore/kvstore.html#mxnet.kvstore.create

Rahul_Ravu · January 27, 2018, 8:55am

Thanks for the explanation @bejjani!

Topic		Replies	Views
Improving GPU usage on public SageMaker mxnet example Gluon	1	546	November 5, 2018
Gluon Multi GPU Out of Memory Issues	6	3419	April 11, 2019
Is it possible to reuse GPU's memory when training a network? Gluon	3	1170	August 10, 2018
Training on gpu(1) and gpu(2) allocates some memory on gpu(0) Gluon	3	564	June 6, 2018
Dataloader cost too much gpu-0 memory Gluon	1	560	August 31, 2018

Question about memory usage during using Multiple GPUs

Related Topics