What is the correct way to get the gradient calculation time inside the training loop?

If I do this,

time_spent = time.time() - begin

The time_spent value is very small. I think the calculation is carried out asynchronously out side the python code.

I also tried to access the gradient to force the calculation in the following way. But I’m not sure it is correct. And I think there is GPU memory leak, because I run out of GPU memory even for small batch size

# try to access gradients so the real calculations are going to be executed
for index, grad_list in enumerate(self._exec_group.grad_arrays):
   if len(grad_list) > 0:
      grad_np = grad_list[0].asnumpy()
time_spent = time.time() - begin

Hi Dong,

You’re correct in that the execution is asynchronous. You can use loss.wait_to_read() which has the same effect as .asnumpy() without performing the additional copy. It will block until all computations that loss depends on are complete. That’s an alternative to running through a loop as you have above.

Take a look at this similar discuss question which sounds like your memory leak:

and this Github issue: