Gradient Calculation time

oldshuren · December 5, 2017, 4:26pm

Hi,

What is the correct way to get the gradient calculation time inside the training loop?

If I do this,

begin=time.time()
self.forward_backward(data_batch)
time_spent = time.time() - begin

The time_spent value is very small. I think the calculation is carried out asynchronously out side the python code.

I also tried to access the gradient to force the calculation in the following way. But I’m not sure it is correct. And I think there is GPU memory leak, because I run out of GPU memory even for small batch size

begin=time.time()
self.forward_backward(data_batch)
# try to access gradients so the real calculations are going to be executed
for index, grad_list in enumerate(self._exec_group.grad_arrays):
   if len(grad_list) > 0:
      grad_np = grad_list[0].asnumpy()
time_spent = time.time() - begin

Thanks for any help!

Dong

VishaalKapoor · July 16, 2018, 8:39pm

Hi Dong,

You’re correct in that the execution is asynchronous. You can use loss.wait_to_read() which has the same effect as .asnumpy() without performing the additional copy. It will block until all computations that loss depends on are complete. That’s an alternative to running through a loop as you have above.

Take a look at this similar discuss question which sounds like your memory leak:

and this Github issue:

Vishaal

Topic		Replies	Views
Gradient fetching Discussion	2	586	May 31, 2018
How to implement the addtion of grad in the backback-propagating,how to add extra term (which is the gradient to middle net layer output) to the network	2	588	August 18, 2018
Difference b/w loss.backward() and mx.autograd.backwars([loss]) Discussion	2	2357	May 14, 2019
WGAN-gp: can't compute gradient penalty with gluon? Gluon	0	408	October 15, 2020
Batchnorm gradient Discussion	1	613	October 22, 2018

Gradient Calculation time

Related Topics