Hello,
This is more of a question about Gluon’s behavior while using multiple GPUs. I tried running one of the samples codes(http://gluon.mxnet.io/chapter07_distributed-learning/multiple-gpus-gluon.html) in the Gluon notebook on a p2.16xlarge box. Here’s the output of nvidia-smi with GPU_COUNT set to 8:
±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 60550 C python3 391MiB |
| 1 60550 C python3 262MiB |
| 2 60550 C python3 262MiB |
| 3 60550 C python3 261MiB |
| 4 60550 C python3 261MiB |
| 5 60550 C python3 261MiB |
| 6 60550 C python3 261MiB |
| 7 60550 C python3 261MiB |
±----------------------------------------------------------------------------+
I don’t expect the memory utilization to be exactly equal across all the GPUs. The difference is greater if I increase the GPU_COUNT to 16. Now, I know the batch size also is multiplied by the GPU_COUNT but the initial batch is stored on the CPU. Is there any additional changes to code that need to be made to ensure a more even memory utilization.
I am actually trying to run a simple FC Network to compute Knowledge Base Embeddings using multiple GPUs. On the same box, the max GPU count I can set to is 4 and anything above fails with a OOM.
Thanks,
Rahul