Hey !
I take the exemple of Mnist CNN from the tuto, that is :
net=gluon.nn.HybridSequential()
with net.name_scope():
net.add(gluon.nn.Conv2D(20, kernel_size=(5,5), activation='tanh'))
net.add(gluon.nn.MaxPool2D(pool_size=(2,2), strides = (2,2)))
net.add(gluon.nn.Conv2D(50, kernel_size=(5,5), activation='tanh'))
net.add(gluon.nn.MaxPool2D(pool_size=(2,2), strides = (2,2)))
net.add(gluon. nn.Dense(500, activation='tanh'))
net.add(gluon. nn.Dense(10, activation='tanh'))
net.initialize(mx.init.Xavier(magnitude=2.3))
I don’t understand why the number of paramters after a fully connected layer is not the same when calling summary and print_summary.
mx.viz.print_summary(net(mx.sym.var('data')), shape={'data':(1,1,32,32)} )
gives
Layer (type) Output Shape Param # Previous Layer
========================================================================================================================
data(null) 1x32x32 0
________________________________________________________________________________________________________________________
hybridsequential0_conv0_fwd(Convolution) 20x28x28 520 data
________________________________________________________________________________________________________________________
hybridsequential0_conv0_tanh_fwd(Activation) 20x28x28 0 hybridsequential0_conv0_fwd
________________________________________________________________________________________________________________________
hybridsequential0_pool0_fwd(Pooling) 20x14x14 0 hybridsequential0_conv0_tanh_fwd
________________________________________________________________________________________________________________________
hybridsequential0_conv1_fwd(Convolution) 50x10x10 25050 hybridsequential0_pool0_fwd
________________________________________________________________________________________________________________________
hybridsequential0_conv1_tanh_fwd(Activation) 50x10x10 0 hybridsequential0_conv1_fwd
________________________________________________________________________________________________________________________
hybridsequential0_pool1_fwd(Pooling) 50x5x5 0 hybridsequential0_conv1_tanh_fwd
________________________________________________________________________________________________________________________
hybridsequential0_dense0_fwd(FullyConnected) 500 25500 hybridsequential0_pool1_fwd
________________________________________________________________________________________________________________________
hybridsequential0_dense0_tanh_fwd(Activation) 500 0 hybridsequential0_dense0_fwd
________________________________________________________________________________________________________________________
hybridsequential0_dense1_fwd(FullyConnected) 10 5010 hybridsequential0_dense0_tanh_fw
________________________________________________________________________________________________________________________
hybridsequential0_dense1_tanh_fwd(Activation) 10 0 hybridsequential0_dense1_fwd
========================================================================================================================
Total params: 56080
and net.summary(mx.nd.zeros((1,1,32,32)))
gives
Layer (type) Output Shape Param #
================================================================================
Input (1, 1, 32, 32) 0
Activation-1 <Symbol hybridsequential0_conv0_tanh_fwd> 0
Activation-2 (1, 20, 28, 28) 0
Conv2D-3 (1, 20, 28, 28) 520
MaxPool2D-4 (1, 20, 14, 14) 0
Activation-5 <Symbol hybridsequential0_conv1_tanh_fwd> 0
Activation-6 (1, 50, 10, 10) 0
Conv2D-7 (1, 50, 10, 10) 25050
MaxPool2D-8 (1, 50, 5, 5) 0
Activation-9 <Symbol hybridsequential0_dense0_tanh_fwd> 0
Activation-10 (1, 500) 0
Dense-11 (1, 500) 625500
Activation-12 <Symbol hybridsequential0_dense1_tanh_fwd> 0
Activation-13 (1, 10) 0
Dense-14 (1, 10) 5010
================================================================================
Parameters in forward computation graph, duplicate included
Total params: 656080
Trainable params: 656080
Non-trainable params: 0
Shared params in forward computation graph: 0
Unique parameters in model: 656080
The number of parameters differs in the first fully connected layer (with 500units). In first case : 25500in second case : 625500. Could someone explain me why this two functions don’t give the same results and tell me which is right ?
thanks