Serialization erases grad_req settings

If I want some parameters in a network to remain fixed during training, I can set grad_req=‘null’. But if I serialize and deserialize the model, I find that my parameters have been set back to grad_req=‘write’.

It is clear that it’s possible to encode grad_req=‘null’ in the serialized model somehow, since batch norm parameters running mean and variance retain grad_req=‘null’ after (de)serialization. How can I do this with other layers?

>>> import mxnet as mx
>>> mx.__version__
>>> data = mx.sym.var('data')
>>> b = mx.sym.var('b')
>>> y = data * b
>>> net = mx.gluon.nn.SymbolBlock(y, [mx.sym.var('data')])
>>> net.collect_params()['b'].grad_req = 'null'
>>> net.hybridize()
>>> net.initialize()
>>> _ = net(mx.nd.array([1]))
>>> net.export('/tmp/grad_req')
>>> symbol_file = '/tmp/grad_req-symbol.json'
>>> params_file = '/tmp/grad_req-0000.params'
>>> deser_net = mx.gluon.nn.SymbolBlock.imports(symbol_file, ['data'], params_file)
>>> deser_net.collect_params()['b'].grad_req

Hi @philipgautier,

I don’t think grad_req is included in the schema of symbol.json, so I don’t think this is possible unless you pickle the block or recreate the block (from shared code) and just save and load the parameters. What’s your use case for loading the model? Will it be in Python?

As for BatchNorm, this is just defined in the implementation of the block itself rather than in the symbol.json file.

Thanks, that’s helpful. My use case is to take an arbitrary network and replace certain layers with other layers which have the same input and output. I’m currently doing this by editing the json file which stores the symbolic network. I want to set grad_req=‘null’ on some of the new parameters. But my output is a (new) json file. With the current state of things, I just have to remember to run some extra code setting grad_req=‘null’ after deserializing. This is all in python.

Is there any documentation on defining a new op? Currently my new blocks are serialized by their component parts (broadcast multiply, concatenate, etc). But if I could define a new op, which would be stored as a single node in the json file, perhaps I could set the grad_req behavior of this block, as in batch norm?

An interesting idea. Check out this tutorial for an example of creating a custom operator. Wouldn’t it just be simpler to manipulate the layers of a model when it’s loaded though, rather than changing the symbol.json file?