Difference between net.summary and viz.print_summary

Hey !

I take the exemple of Mnist CNN from the tuto, that is :

net=gluon.nn.HybridSequential()
with net.name_scope():
    net.add(gluon.nn.Conv2D(20, kernel_size=(5,5), activation='tanh')) 
    net.add(gluon.nn.MaxPool2D(pool_size=(2,2), strides = (2,2)))
    
    net.add(gluon.nn.Conv2D(50, kernel_size=(5,5), activation='tanh')) 
    net.add(gluon.nn.MaxPool2D(pool_size=(2,2), strides = (2,2)))
    
    net.add(gluon. nn.Dense(500, activation='tanh'))
    net.add(gluon. nn.Dense(10, activation='tanh'))
net.initialize(mx.init.Xavier(magnitude=2.3))

I don’t understand why the number of paramters after a fully connected layer is not the same when calling summary and print_summary.

mx.viz.print_summary(net(mx.sym.var('data')), shape={'data':(1,1,32,32)} ) gives

Layer (type)                                        Output Shape            Param #     Previous Layer                  
========================================================================================================================
data(null)                                          1x32x32                 0                                           
________________________________________________________________________________________________________________________
hybridsequential0_conv0_fwd(Convolution)            20x28x28                520         data                            
________________________________________________________________________________________________________________________
hybridsequential0_conv0_tanh_fwd(Activation)        20x28x28                0           hybridsequential0_conv0_fwd     
________________________________________________________________________________________________________________________
hybridsequential0_pool0_fwd(Pooling)                20x14x14                0           hybridsequential0_conv0_tanh_fwd
________________________________________________________________________________________________________________________
hybridsequential0_conv1_fwd(Convolution)            50x10x10                25050       hybridsequential0_pool0_fwd     
________________________________________________________________________________________________________________________
hybridsequential0_conv1_tanh_fwd(Activation)        50x10x10                0           hybridsequential0_conv1_fwd     
________________________________________________________________________________________________________________________
hybridsequential0_pool1_fwd(Pooling)                50x5x5                  0           hybridsequential0_conv1_tanh_fwd
________________________________________________________________________________________________________________________
hybridsequential0_dense0_fwd(FullyConnected)        500                     25500       hybridsequential0_pool1_fwd     
________________________________________________________________________________________________________________________
hybridsequential0_dense0_tanh_fwd(Activation)       500                     0           hybridsequential0_dense0_fwd    
________________________________________________________________________________________________________________________
hybridsequential0_dense1_fwd(FullyConnected)        10                      5010        hybridsequential0_dense0_tanh_fw
________________________________________________________________________________________________________________________
hybridsequential0_dense1_tanh_fwd(Activation)       10                      0           hybridsequential0_dense1_fwd    
========================================================================================================================
Total params: 56080

and net.summary(mx.nd.zeros((1,1,32,32))) gives

  Layer (type)                                Output Shape         Param #
================================================================================
           Input                              (1, 1, 32, 32)               0
    Activation-1   <Symbol hybridsequential0_conv0_tanh_fwd>               0
    Activation-2                             (1, 20, 28, 28)               0
        Conv2D-3                             (1, 20, 28, 28)             520
     MaxPool2D-4                             (1, 20, 14, 14)               0
    Activation-5   <Symbol hybridsequential0_conv1_tanh_fwd>               0
    Activation-6                             (1, 50, 10, 10)               0
        Conv2D-7                             (1, 50, 10, 10)           25050
     MaxPool2D-8                               (1, 50, 5, 5)               0
    Activation-9  <Symbol hybridsequential0_dense0_tanh_fwd>               0
   Activation-10                                    (1, 500)               0
        Dense-11                                    (1, 500)          625500
   Activation-12  <Symbol hybridsequential0_dense1_tanh_fwd>               0
   Activation-13                                     (1, 10)               0
        Dense-14                                     (1, 10)            5010
================================================================================
Parameters in forward computation graph, duplicate included
   Total params: 656080
   Trainable params: 656080
   Non-trainable params: 0
Shared params in forward computation graph: 0
Unique parameters in model: 656080 

The number of parameters differs in the first fully connected layer (with 500units). In first case : 25500in second case : 625500. Could someone explain me why this two functions don’t give the same results and tell me which is right ?

thanks