Question about params[:] = param - lr*param.grad/batch_size

hfutxrg · May 16, 2019, 7:57am

I am not a Python expert, so I have a question about the code in sgd() function:
for param in params:
param[:] = param - lr*param.grad/batch_size

Here why don’t we just write the code as follows?
for param in params:
param = param - lr*param.grad/batch_size

The operation should be element-wise. What’s the difference of using param[:] and param? I tried to change the code to param, but the output will be incorrect. Any help is appreciated!

Sergey · May 16, 2019, 7:08pm

This way allows you to assign new value to existing NDArray, rather than making it reference another location in memory. Consider the following code:

import mxnet as mx

a = mx.random.uniform(shape=(10, 5))
print(hex(id(a))) # print memory address 'a' references to

a[:] = a - 1 
print(hex(id(a))) # print memory address again

a = a - 1
print(hex(id(a))) # ...and again

The output will be (actual addresses are going to be different in your run):

0x11421a3c8
0x11421a3c8
0x10a8af1d0

As you can see, first two addresses are the same - assignment of a[:] just overwritten the data of a. The last address is different: a points to a different memory location.

mouryarishik · May 17, 2019, 2:30pm

As @Sergey has explained, by using [:] you prevent creating a new NDArray. So doing param[:] = param - lr*param.grad/batch_size you are replacing the internal value of param with “param - lr*param.grad/batch_size” instead of creating a new NDArray.

And welcome back again to the community.

hfutxrg · May 17, 2019, 3:50pm

Thank you @Sergey and @mouryarishik for the very detailed explanation with code examples. I completely understand it now.

Topic		Replies	Views
Sharing parameters between two modules through arg_dict	1	948	July 26, 2018
Understanding NDArrayIter	6	1208	July 2, 2018
Positional arguments must have NDArray type	2	3684	August 4, 2019
Nan generated when I use backward for symbol	2	770	September 25, 2018
How to use ndarray.contrib.DeformableConvolution?	3	445	December 12, 2018

Question about params[:] = param - lr*param.grad/batch_size

Related Topics