Understand number of parameters in mx.viz.print_summary(sym)

Hi, I’m creating a basic convnet and picturing its summary with the script below:

# define CNN
num_inputs = 784
num_outputs = 10
num_fc = 256

net = gluon.nn.HybridSequential()

with net.name_scope():
    
    net.add(gluon.nn.Conv2D(channels=16, kernel_size=3, activation='relu'))
    net.add(gluon.nn.MaxPool2D(pool_size=2, strides=2))
    net.add(gluon.nn.Conv2D(channels=16, kernel_size=3, activation='relu'))
    net.add(gluon.nn.MaxPool2D(pool_size=2, strides=2))
    
    # The Flatten layer collapses all axis, except the first one, into one axis.
    net.add(gluon.nn.Flatten())
    net.add(gluon.nn.Dense(num_fc, activation="relu"))
    net.add(gluon.nn.Dropout(.4))
    net.add(gluon.nn.Dense(num_outputs))
    
net.cast('float16')
mx.viz.print_summary(sym)

this returns Param # = 16 for hybridsequential1_conv0_fwd(Convolution)

I expected 16 channels*(3*3) + 16 biases = 160 parameters for this layer. Why do I have 16 here? is there a way to see total parameter numbers in Gluon?
Cheers
Olivier

below the output:

________________________________________________________________________________________________________________________
Layer (type)                                        Output Shape            Param #     Previous Layer                  
========================================================================================================================
data(null)                                                                  0                                           
________________________________________________________________________________________________________________________
hybridsequential1_conv0_fwd(Convolution)                                    16          data                            
________________________________________________________________________________________________________________________
hybridsequential1_conv0_relu_fwd(Activation)                                0           hybridsequential1_conv0_fwd     
________________________________________________________________________________________________________________________
hybridsequential1_pool0_fwd(Pooling)                                        0           hybridsequential1_conv0_relu_fwd
________________________________________________________________________________________________________________________
hybridsequential1_conv1_fwd(Convolution)                                    16          hybridsequential1_pool0_fwd     
________________________________________________________________________________________________________________________
hybridsequential1_conv1_relu_fwd(Activation)                                0           hybridsequential1_conv1_fwd     
________________________________________________________________________________________________________________________
hybridsequential1_pool1_fwd(Pooling)                                        0           hybridsequential1_conv1_relu_fwd
________________________________________________________________________________________________________________________
hybridsequential1_flatten0_flatten0(Flatten)                                0           hybridsequential1_pool1_fwd     
________________________________________________________________________________________________________________________
hybridsequential1_dense0_fwd(FullyConnected)                                256         hybridsequential1_flatten0_flatt
________________________________________________________________________________________________________________________
hybridsequential1_dense0_relu_fwd(Activation)                               0           hybridsequential1_dense0_fwd    
________________________________________________________________________________________________________________________
hybridsequential1_dropout0_fwd(Dropout)                                     0           hybridsequential1_dense0_relu_fw
________________________________________________________________________________________________________________________
hybridsequential1_dense1_fwd(FullyConnected)                                10          hybridsequential1_dropout0_fwd  
========================================================================================================================
Total params: 298
________________________________________________________________________________________________________________________

Check this for an example of visualization, graph and summary: https://github.com/ThomasDelteil/PerformanceTricksMXNetGluon/blob/master/PerformanceTest.ipynb

In short, you need to specify your data shape, because that will impact the number of parameters, depending on how many input channel you hvae and the size of your images.

try that

# define CNN
num_inputs = 784
num_outputs = 10
num_fc = 256

net = gluon.nn.HybridSequential()

with net.name_scope():
    
    net.add(gluon.nn.Conv2D(channels=16, kernel_size=3, activation='relu'))
    net.add(gluon.nn.MaxPool2D(pool_size=2, strides=2))
    net.add(gluon.nn.Conv2D(channels=16, kernel_size=3, activation='relu'))
    net.add(gluon.nn.MaxPool2D(pool_size=2, strides=2))
    
    # The Flatten layer collapses all axis, except the first one, into one axis.
    net.add(gluon.nn.Flatten())
    net.add(gluon.nn.Dense(num_fc, activation="relu"))
    net.add(gluon.nn.Dropout(.4))
    net.add(gluon.nn.Dense(num_outputs))
    
net.cast('float16')
mx.viz.print_summary(
    net(mx.sym.var('data')), 
    shape={'data':(1,3,224,224)}, #set your shape here
)
________________________________________________________________________________________________________________________
Layer (type)                                        Output Shape            Param #     Previous Layer                  
========================================================================================================================
data(null)                                          3x224x224               0                                           
________________________________________________________________________________________________________________________
hybridsequential0_conv0_fwd(Convolution)            16x222x222              448         data                            
________________________________________________________________________________________________________________________
hybridsequential0_conv0_relu_fwd(Activation)        16x222x222              0           hybridsequential0_conv0_fwd     
________________________________________________________________________________________________________________________
hybridsequential0_pool0_fwd(Pooling)                16x111x111              0           hybridsequential0_conv0_relu_fwd
________________________________________________________________________________________________________________________
hybridsequential0_conv1_fwd(Convolution)            16x109x109              2320        hybridsequential0_pool0_fwd     
________________________________________________________________________________________________________________________
hybridsequential0_conv1_relu_fwd(Activation)        16x109x109              0           hybridsequential0_conv1_fwd     
________________________________________________________________________________________________________________________
hybridsequential0_pool1_fwd(Pooling)                16x54x54                0           hybridsequential0_conv1_relu_fwd
________________________________________________________________________________________________________________________
hybridsequential0_flatten0_flatten0(Flatten)        46656                   0           hybridsequential0_pool1_fwd     
________________________________________________________________________________________________________________________
hybridsequential0_dense0_fwd(FullyConnected)        256                     11944192    hybridsequential0_flatten0_flatt
________________________________________________________________________________________________________________________
hybridsequential0_dense0_relu_fwd(Activation)       256                     0           hybridsequential0_dense0_fwd    
________________________________________________________________________________________________________________________
hybridsequential0_dropout0_fwd(Dropout)             256                     0           hybridsequential0_dense0_relu_fw
________________________________________________________________________________________________________________________
hybridsequential0_dense1_fwd(FullyConnected)        10                      2570        hybridsequential0_dropout0_fwd  
========================================================================================================================
Total params: 11949530
________________________________________________________________________________________________________________________
1 Like

good point! excellent thanks