I am currently trying to transfer Gluon models into Module models.
For me to do this successfully on a softmax output neural net, I need to do the following:
def block2symbol(block):
data = mx.sym.Variable('data')
sym = block(data)
params = block.collect_params()
arg_params = {}
aux_params = {}
for k, v in params.items():
if v._stype == 'default':
data = v.data()
else:
raise NotImplemented("stype {} is not yet supported for parameters in block2symbol.")
arg_params[k] = data
aux_params[k] = data
return sym, arg_params, aux_params
# Converting gluon into module
mx_sym, args, auxs = block2symbol(net) # net is some gluon block.
# Need name = softmax so that label_names can handle softmax_label
mx_sym = mx.sym.SoftmaxOutput(data=mx_sym, name='softmax')
model = mx.mod.Module(symbol = mx_sym, context = mx.cpu(),
label_names = ['softmax_label'])
model.bind(for_training=False,
data_shapes = data_iter.provide_data,
label_shapes = data_iter.provide_label)
model.set_params(args, auxs)
I understand why we need everything above, except for this line:
mx_sym = mx.sym.SoftmaxOutput(data=mx_sym, name='softmax')
If I don’t include this line, andI just use the mx_syms as they are, then I get the error:
KeyError: 'softmax_label'
Which is associated with the label_names
argument in my Module.
I looked inside the internals and this is the difference between adding a SoftmaxOutput
:
before:
['data',
'hybridsequential0_conv0_weight',
'hybridsequential0_conv0_bias',
'hybridsequential0_conv0_fwd_output',
'hybridsequential0_conv0_relu_fwd_output',
'hybridsequential0_conv1_weight',
...
'hybridsequential0_conv3_fwd_output',
'hybridsequential0_conv3_relu_fwd_output',
'hybridsequential0_flatten0_reshape0_output',
'hybridsequential0_dense0_weight',
'hybridsequential0_dense0_bias',
'hybridsequential0_dense0_fwd_output',
'hybridsequential0_dense0_relu_fwd_output',
'hybridsequential0_dense1_weight',
'hybridsequential0_dense1_bias',
'hybridsequential0_dense1_fwd_output']
after:
['data',
'hybridsequential0_conv0_weight',
'hybridsequential0_conv0_bias',
'hybridsequential0_conv0_fwd_output',
'hybridsequential0_conv0_relu_fwd_output',
'hybridsequential0_conv1_weight',
...
'hybridsequential0_conv3_fwd_output',
'hybridsequential0_conv3_relu_fwd_output',
'hybridsequential0_flatten0_reshape0_output',
'hybridsequential0_dense0_weight',
'hybridsequential0_dense0_bias',
'hybridsequential0_dense0_fwd_output',
'hybridsequential0_dense0_relu_fwd_output',
'hybridsequential0_dense1_weight',
'hybridsequential0_dense1_bias',
'hybridsequential0_dense1_fwd_output',
'softmax_label',
'softmax_output']
So I guess I need the _label
. However, what is this label and why do I need it? I can’t see the sourcecode for SoftmaxOutput since it’s in the generated python file. More importantly, what if we want something that’s not a softmax output? Gluon doesn’t have a softmax symbol at the very end, so we could use the output of that neural network for anything, not necessarily passing it through softmax.