Conversion from Gluon to Module and vice versa?

I would like to know how to convert between the two versions because it seems the quantization capabilities are mainly for the syms, arg_params, aux_params tuple style passing, which can be wrapped around modules well, but not gluon models(correct me if I’m wrong).

Here’s a small code snippet that trains a cnn model:

batch_size = 64
num_inputs = 784
num_outputs = 10
data_iter = mx.io.NDArrayIter(x, y, batch_size=batch_size)

num_fc = 512
net = gluon.nn.HybridSequential()
with net.name_scope():
    net.add(gluon.nn.Conv2D(channels=20, kernel_size=5, activation='relu'))
    net.add(gluon.nn.MaxPool2D(pool_size=2, strides=2))
    net.add(gluon.nn.Conv2D(channels=50, kernel_size=5, activation='relu'))
    net.add(gluon.nn.MaxPool2D(pool_size=2, strides=2))
    net.add(gluon.nn.Flatten())
    net.add(gluon.nn.Dense(num_fc, activation="relu"))
    net.add(gluon.nn.Dense(num_outputs))

net.hybridize()
# Parameter initialization
net.collect_params().initialize(mx.init.Xavier(magnitude=2.24), ctx=ctx)
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': .1})
softmax_cross_entropy = gluon.loss.SoftmaxCrossEntropyLoss()
for i, batch in enumerate(data_iter):
    data = batch.data[0].as_in_context(ctx)
    label = batch.label[0].as_in_context(ctx)
    with autograd.record():
        output = net(data)
        loss = softmax_cross_entropy(output, label)
    loss.backward()
    trainer.step(data.shape[0])

If I want to quantize a gluon model, I would try to serialize gluon into disk, and then bring it back out as module. This may cause troubles:

import os
net.export('mxnet')
mod = mx.module.Module.load('mxnet', 0) # 0 epoch

FWIW, I got a warning during the loading step, but was not sure what this was about:

/Users/ray_zhang/anaconda3/envs/idp3/lib/python3.6/site-packages/mxnet/module/base_module.py:54: UserWarning: You created Module with Module(..., label_names=['softmax_label']) but input with name 'softmax_label' is not found in symbol.list_arguments(). Did you mean one of:
	data
  warnings.warn(msg)

and as per the module API:

mod.bind( data_shapes = data_iter.provide_data, 
          label_shapes = data_iter.provide_label)
mod.predict(x)

but it does not work upon bind(), with the following stacktrace:

----------------------------------------------
KeyError     Traceback (most recent call last)
<ipython-input-10-f53137bb5e95> in <module>()
      1 mod.bind( data_shapes = data_iter.provide_data, 
----> 2           label_shapes = data_iter.provide_label)
      3 mod.predict(x)

~/anaconda3/envs/idp3/lib/python3.6/site-packages/mxnet/module/module.py in bind(self, data_shapes, label_shapes, for_training, inputs_need_grad, force_rebind, shared_module, grad_req)
    434                                                      fixed_param_names=self._fixed_param_names,
    435                                                      grad_req=grad_req, group2ctxs=self._group2ctxs,
--> 436                                                      state_names=self._state_names)
    437         self._total_exec_bytes = self._exec_group._total_exec_bytes
    438         if shared_module is not None:

~/anaconda3/envs/idp3/lib/python3.6/site-packages/mxnet/module/executor_group.py in __init__(self, symbol, contexts, workload, data_shapes, label_shapes, param_names, for_training, inputs_need_grad, shared_group, logger, fixed_param_names, grad_req, state_names, group2ctxs)
    281 
    282         eprint(sys._getframe().f_lineno, data_shapes, label_shapes)
--> 283         self.bind_exec(data_shapes, label_shapes, shared_group)
    284 
    285     def decide_slices(self, data_shapes):

~/anaconda3/envs/idp3/lib/python3.6/site-packages/mxnet/module/executor_group.py in bind_exec(self, data_shapes, label_shapes, shared_group, reshape)
    388         if label_shapes is not None:
    389             self.label_names = [i.name for i in self.label_shapes]
--> 390         self._collect_arrays()
    391 
    392     def reshape(self, data_shapes, label_shapes):

~/anaconda3/envs/idp3/lib/python3.6/site-packages/mxnet/module/executor_group.py in _collect_arrays(self)
    324             self.label_arrays = [[(self.slices[i], e.arg_dict[name])
    325                                   for i, e in enumerate(self.execs)]
--> 326                                  for name, _ in self.label_shapes]
    327         else:
    328             self.label_arrays = None

~/anaconda3/envs/idp3/lib/python3.6/site-packages/mxnet/module/executor_group.py in <listcomp>(.0)
    324             self.label_arrays = [[(self.slices[i], e.arg_dict[name])
    325                                   for i, e in enumerate(self.execs)]
--> 326                                  for name, _ in self.label_shapes]
    327         else:
    328             self.label_arrays = None

~/anaconda3/envs/idp3/lib/python3.6/site-packages/mxnet/module/executor_group.py in <listcomp>(.0)
    323                 eprint(323, e.arg_dict.keys())
    324             self.label_arrays = [[(self.slices[i], e.arg_dict[name])
--> 325                                   for i, e in enumerate(self.execs)]
    326                                  for name, _ in self.label_shapes]
    327         else:

KeyError: 'softmax_label'

Which is complaining about me missing that label in my e.arg_dict.

I printed out e.arg_dict:

dict_keys(['data', 'hybridsequential1_conv0_weight', 'hybridsequential1_conv0_bias', 'hybridsequential1_conv1_weight', 'hybridsequential1_conv1_bias', 'hybridsequential1_dense0_weight', 'hybridsequential1_dense0_bias', 'hybridsequential1_dense1_weight', 'hybridsequential1_dense1_bias'])

And indeed, softmax_label is not in there. Where is this label coming from and how can I convert gluon to module correctly?

With regards to your initial point on quantization, this can be done with Gluon.

You need to cast your Block (i.e. network) and also cast your inputs to your network:

net = net.cast(np.float16)
...
data = data.astype(np.float16)
...
net(data)

Check out the video here for more details.

Hi there,

Thanks for the answer. I was not aware of this, but I was also referring to another implementation where the layers are artificially modified to have thresholding and casted to int8:

See: https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/contrib/quantization.py

and here for a tutorial: https://github.com/apache/incubator-mxnet/tree/master/example/quantization

This is what I am trying to achieve, since even though fp16 is a good win, int8 would be a much better win. (the symmetric int8 quantization technique cannot be simply done via a cast, and must compute KL divergence to compute the best ‘cast’)

I am looking into the source code for mxnet Module right now and do not understand what label_arrays and label_names are exactly(the documentations don’t describe this), and I think I may be able to understand the source of my problems if that was explained.