GPU memory garbage collection

alinagithub · January 3, 2019, 3:57am

I train sequentially different networks, and like to rely on the GPU Free Memory value to dynamically compute some heuristics such as about the batch size. However the GPU memory is not released (even after delete, gc.collect(), nvidia-smi, pycuda.tools.clear_context_caches()…) and is re-used by mxnet for efficiency - which prevents measuring the real free memory.

Is there a know way to explicitly reclaim unused GPU memory?
Or any alternate idea?

This need appeared a few times some time ago, but I haven’t found any solution since (https://github.com/apache/incubator-mxnet/issues/1946, https://github.com/apache/incubator-mxnet/issues/2827)

Many thanks,
AL

sad · January 3, 2019, 10:22pm

Hi AL, I’m not really sure how you would go about doing this because like you said mxnet GPU memory deallocation is asynchronous. Looks like maybe this merged pr https://github.com/apache/incubator-mxnet/pull/2927/files attempted to address some of those issues. You can try playing around with some of the environment variables here https://github.com/apache/incubator-mxnet/blob/master/docs/faq/env_var.md#memory-options particularly MXNET_GPU_MEM_POOL_RESERVE

Topic		Replies	Views
How to release the GPU memory in MXNET Discussion	4	3324	October 29, 2019
Free GPU memory? Gluon	1	1542	November 27, 2018
How to limit GPU memory usage	8	4512	September 1, 2020
Understanding MXNet GPU Memory Allocation	2	884	June 26, 2018
How to allocate fixed amount of gpu memory? Discussion	4	1388	September 19, 2019

GPU memory garbage collection

Related Topics