If the initialization method is not specified, such as net.initialize()
, MXNet will use the default random initialization method: each element of the weight parameter is randomly sampled with a uniform distribution U[−0.07,0.07]U[−0.07,0.07] and the bias parameters are all set to 0. Please refer to http://d2l.ai/chapter_multilayer-perceptrons/numerical-stability-and-init.html#parameter-initialization for more information.
For example, if you use net.initialize(mx.init.One()) AND weight_initializer=mx.init.One() as in:
net = nn.Sequential()
net.add(nn.Conv2D(16, 3, 1, 1, weight_initializer=mx.init.One(), activation='relu'))
net.add(nn.Dense(10))
net.initialize(mx.init.One())
x = nd.random.uniform(shape=(1,1,2, 10))
net(x)
print(net[1].weight.data())
You can see that the weights are:
[[1. 1. 1. … 1. 1. 1.]
[1. 1. 1. … 1. 1. 1.]
[1. 1. 1. … 1. 1. 1.]
…
[1. 1. 1. … 1. 1. 1.]
[1. 1. 1. … 1. 1. 1.]
[1. 1. 1. … 1. 1. 1.]]
<NDArray 10x320 @cpu(0)>
But if you use net.initialize() AND weight_initializer=mx.init.One() as in:
net = nn.Sequential()
net.add(nn.Conv2D(16, 3, 1, 1, weight_initializer=mx.init.One(), activation='relu'))
net.add(nn.Dense(10))
net.initialize()
x = nd.random.uniform(shape=(1,1,2, 10))
net(x)
print(net[1].weight.data())
The weights are:
[[-0.02335968 -0.01407015 -0.05864581 … 0.05376593 -0.03404731
-0.06720807]
[-0.05495147 0.06723212 0.0113286 … -0.00772312 0.01102494
0.05772652]
[ 0.00861803 -0.06468396 0.05282693 … 0.06916221 -0.01219748
-0.0272661 ]
…
[ 0.04516336 0.0003779 0.01198862 … -0.04161773 0.02530347
0.06112661]
[-0.00585618 -0.02169332 -0.01204515 … -0.05935499 -0.03586181
-0.06428157]
[-0.0113162 0.01251774 -0.05391466 … 0.00918427 0.03665156
0.02068089]]
<NDArray 10x320 @cpu(0)>