Save CNN model architecture and params

I have followed the CNN tutorial here: http://gluon.mxnet.io/chapter04_convolutional-neural-networks/cnn-gluon.html.

How do I save/load the CNN model’s architecture as .json. I know how to save the model’s params by using http://gluon.mxnet.io/chapter03_deep-neural-networks/serialization.html#Saving-and-loading-the-parameters-of-gluon-models

Because a typical gluon code is a define-by-run computational graph, there is no model to save! This may sound strange if you have had experience with declarative frameworks like MXNet’s module API, Caffe, or Tensorflow. However, MXNet does allow you to easily generate a symbol that represents your model’s computational graph if the computational graph is indeed a non-dynamic graph.

In order to construct a non-dynamic graph, all blocks used in the network must be a HybridBlock. You can read more about them in this Gluon tutorial.

Good news is that CNN networks are almost always non-dynamic computational graphs that can be represented by HybridBlocks. In the tutorial that you provided, all you need to do is modify net from gluon.nn.Sequential() to gluon.nn.HybridSequential(). Then instead of passing an NDArray to net, you simply pass a Symbol and the retuned result is going to be a symbol that represents the computational graph of your network, This symbol can be converted to json and saved. Here is an example based on the tutorial you mentioned:

First create the initial network as a HybridBlock, create a symbol, and convert the symbol to json.

batch_size = 64
num_inputs = 784
num_outputs = 10
num_fc = 512

net = gluon.nn.HybridSequential()
with net.name_scope():
    net.add(gluon.nn.Conv2D(channels=20, kernel_size=5, activation='relu'))
    net.add(gluon.nn.MaxPool2D(pool_size=2, strides=2))
    net.add(gluon.nn.Conv2D(channels=50, kernel_size=5, activation='relu'))
    net.add(gluon.nn.MaxPool2D(pool_size=2, strides=2))
    # The Flatten layer collapses all axis, except the first one, into one axis.
    net.add(gluon.nn.Flatten())
    net.add(gluon.nn.Dense(num_fc, activation="relu"))
    net.add(gluon.nn.Dense(num_outputs))

sym_json = net(mx.sym.var('data')).tojson()

You can save this json string to a file. Now when you want to load model, you can use the gluon.nn.SymbolBlock to load the symbol:

net = gluon.nn.SymbolBlock(
    outputs=mx.sym.load_json(sym_json),
    inputs=mx.sym.var('data'))

Now you can use the net just like the original net. Specifically, you can now call load_params() on it with a path to the file where the parameters of the trained network are saved and then pass NDArray to it to make prediction:

net.load_params(params_filename)
x = mx.nd.random.uniform(shape=(16, 3, 224, 224))
predictions = net(x)
print(predictions.shape)

The above prints:

(16, 10)

which is the correct output shape for this CNN classification network.

2 Likes

hi, I wonder is there a way to store the dynamic network using json file? During the network definition, input data’shape has been used to define the network. The final goal is to use some network-definition-file, whose format is independent of Python and then load params file. That’s my aim.

A dynamic network is one that changes its structure depending on the input data. The clearest example I’ve seen is TreeLSTM. Because of this property, they cannot be stored in json, unless the json format can basically capture a complex control flow similar to python.

From what you explain, it sounds like your network structure is data dependent, but the data dependency can be fixed once you look at the first data item as all items are going to result in the same network architecture. If so, there are multiple ways of achieving this. Let me know if that’s what you’re after and I can elaborate more.

I’ve tried this, but failed.
Most part of the network is static, only the last layers will be dynamic.
So I split it to 2 parts, finally I just want the static part to be saved.

But it fails with “hybrid first or no to_json method”.
I wonder if it’s the right way to implement this idea.
If it’s right, then what’s wrong about these little code.
what I wanna do is to save the feature-extraction net and its params to files(independent of Python) of this traing repo : https://github.com/chaoyuaw/incubator-mxnet/tree/master/example/gluon/embedding_learning

class Feature(HybridBlock):
	# define static part

class Last(Block):
	# define dynamic part	
	def __init__(self, feature_net,...):
		self.feature = feature_net
		#... dynamic part

	def forward(self,x):
		#...

net_a = Feature(...)
net_b = Last(net_a,...)

#train net_b many times


sym_json = net_a.tojson()
sym_json.save('model.json')
net_a.save_params('test.params')
#or
#x = mx.symbol.var('data')
#net_a.hybridize()
#net_a(x)
#net_a.export('test')
#

net_c = gluon.nn.SymbolBlock(
    outputs=mx.sym.load_json(sym_json),
    inputs=mx.sym.var('data'))

net_c.load_params(params_filename)
x = mx.nd.random.uniform(shape=(16, 3, 224, 224))
predictions = net_c(x)
print(predictions.shape)

Not sure what you’re doing wrong, but here is a sample code that does the job:


class MyHybridBlock(HybridBlock):
    def __init__(self):
        super(MyHybridBlock, self).__init__()
        with self.name_scope():
            self.conv = nn.Conv1D(channels=256, kernel_size=2, layout='NCW', use_bias=False, activation='relu')

    def hybrid_forward(self, F, x):
        return self.conv(x)


class MyBlock(Block):
    def __init__(self, conv):
        super(MyBlock, self).__init__()
        self.conv = conv

    def forward(self, x):
        conv_out = self.conv(x)
        conv_out[:,0,:] *= 10  # Some random imperative op
        return conv_out


if __name__ == '__main__':
    net = MyBlock(MyHybridBlock())
    net.initialize()
    net.hybridize()

    # Train the network
    data = nd.random.uniform(shape=(16, 1024, 1000))  # NCW layout
    _ = net(data)

    net.conv.export("/home/ec2-user/MyHybridBlock")

1.I’ve tested your idea. But if fails during training after using net.hybridize().
The training uses the shape of the input. So in my opinion, the whole nets cann’t be hybridized before training

2.And then I’ve tried this:
also failed with prompt ‘first hybridize and then forward…’

if __name__ == '__main__':
    net = MyBlock(MyHybridBlock())
    net.initialize()
    # Train the network...here

    net.conv.hybridize()
    data = nd.random.uniform(shape=(1,3, 224, 224))  # NCHW layout
    _ = net.conv(data)

    net.conv.export("test")
  1. Depend on 1 and 2, I wonder if there is a way to copy the first part of the whole networs.
    i.e. to_be_saved = fake_deep_copy_model(net.conv)
    and then to hybridize to_be_saved and forward it with some fake data.
    Thus the training and hybridize_save_process can be separated.
    But I don’t know the right way to deepcopy net.conv. :frowning:

If you need the shape of the data in your HybridBlock (and that shape is definitely needed and you are not making a mistake), then what you can do is to override the forward() call of the HybridBlock, save the shape and then call into the base class:

class MyHybridBlock(HybridBlock):
    def __init__(self):
        super(MyHybridBlock, self).__init__()
        with self.name_scope():
            # MyHybridBlock blocks

    def forward(self, x):
        """ override HybridBlock.forward
        """
        # store the data shape
        self._shape = x.shape
        return super(MyHybridBlock, self).forward(x)

    def hybrid_forward(self, F, x):
        # Use self._shape instead of x.shape in this function

2 Likes

Thanks for your reply.
I’ve tested your idea, it works.
It proves that I’ve made a mistake:using forward in HybridBlock instead of hybrid_forward.
And the whole network and training is too messy(not a good habbit).
Test with simple examples is a good way.
For others who may face the same situation, my simple test is:

import mxnet as mx
from mxnet.gluon import nn, Block, HybridBlock

class FeatureNet(HybridBlock):
    def __init__(self, **kwargs):
        super(FeatureNet, self).__init__(**kwargs)
        with self.name_scope():
            self.base_net = mx.gluon.model_zoo.vision.mobilenet1_0(pretrained=True)
            self.dense = nn.Dense(128)
    def hybrid_forward(self, F, x):
        z = self.base_net(x)
        z = self.dense(z)
        return z


class DynamicNet(Block):
    def __init__(self, feature_net, **kwargs):
        super(DynamicNet, self).__init__(**kwargs)
        with self.name_scope():
            self.feature_net = feature_net
            # Imagine below is dynamic part
            self.dense = nn.Dense(10)
            

    def forward(self, x):
        z = self.feature_net(x)
        # Imagine below is dynamic part
        z = self.dense(z)
        return z

featnet = FeatureNet()
featnet.dense.initialize(mx.init.Xavier(magnitude=2))

net = DynamicNet(featnet)
net.dense.initialize(mx.init.Xavier(magnitude=2))

net.feature_net.hybridize() 
# or net.hybridize() will work

net.feature_net(mx.nd.ones((1,3,224,224)))
#or net(mx.nd.ones((1,3,224,224))) will work
net.feature_net.export('test')