What is the correct way to get the gradient calculation time inside the training loop?
If I do this,
begin=time.time()
self.forward_backward(data_batch)
time_spent = time.time() - begin
The time_spent value is very small. I think the calculation is carried out asynchronously out side the python code.
I also tried to access the gradient to force the calculation in the following way. But I’m not sure it is correct. And I think there is GPU memory leak, because I run out of GPU memory even for small batch size
begin=time.time()
self.forward_backward(data_batch)
# try to access gradients so the real calculations are going to be executed
for index, grad_list in enumerate(self._exec_group.grad_arrays):
if len(grad_list) > 0:
grad_np = grad_list[0].asnumpy()
time_spent = time.time() - begin
You’re correct in that the execution is asynchronous. You can use loss.wait_to_read() which has the same effect as .asnumpy() without performing the additional copy. It will block until all computations that loss depends on are complete. That’s an alternative to running through a loop as you have above.
Take a look at this similar discuss question which sounds like your memory leak: