How to insert a gluon block layer into mxnet symbol model

I have cumstom a layer in gluon, but don’t know how to call this layer inside the mxnet symbol model.

The gluon layer is as follows:

class GALayer(nn.Block):
    def __init__(self, output_units, in_units, **kwargs):
    super(GALayer, self).__init__(**kwargs)
    self._config = kwargs['config']
    self.weight = self.params.get('weight_gaussian', shape=(in_units, output_units))

    def forward(self, x):
    # x is input_data

    x = x.squeeze()
    w = mx.nd.transpose(self.weight, axes=(1, 0))
    m, d1 = x.shape
    n, d2 = w.shape

    assert d1 == d2
    assert n == self._config.TRAIN.num_hidden

    x_tmp = mx.nd.power(x, 2).sum(-1, keepdims=True)
    w_tmp = mx.nd.power(w, 2).sum(-1, keepdims=True)
    xx = mx.nd.tile(x_tmp, reps=(n,))
    ww = mx.nd.transpose(mx.nd.tile(w_tmp, reps=(m,)), axes=(1, 0))
    # dist = ww + xx
    dist = mx.nd.add(mx.nd.dot(x, w.T), ww + xx)
    output_d = mx.nd.exp(-dist / self._config.TRAIN.affinity_delta)

    return output_d

And the original mxnet symbol model is:

last_feat = mx.sym.Dropout(data=last_feat, name='cnn_drop1')
gaussian_aff_layer = GALayer(output_units=num_classes, in_units=2048, config=config)
new_feat = gaussian_aff_layer(last_feat)

First of all welcome to the community. :raising_hand_man:
Now to your solution.

1 - You have to use nn.HybridBlock(which can be hybridized), instead of nn.Block.
2 - So you have to write a "hybrid_forward" not "forward".
3 - While using HybridBlock remember to write "F" as an argument inside your "hybrid_forward" definition.
4 - You have to hybridize the layer that you want to insert.

Hybridizing converts your code from the dynamic graph to static graph, which can be further attached to a static graph(if needed). In MXNet usually(or I should say 99.9999% of the time) we hybridize our model for better performance. For more about Gluon performance checkout this, for more about Gluon checkout this, this, this and this, and this HOLY BOOK FOR GLUON.

And for your case we can plug a hybridized gluon layer in symbol model.
Below is the working code for a toy example:

class gluon_layer(nn.HybridBlock):
    def __init__(self, **kwargs):
        super(gluon_layer, self).__init__(**kwargs)
        self.dense = nn.Dense(128, 'relu')
    
    def hybrid_forward(self, F, x):
        return self.dense(x)

GLUON_LAYER = gluon_layer()
GLUON_LAYER.hybridize()

data = mx.sym.var('data')
layer1 = GLUON_LAYER(data)
layer2 = mx.sym.FullyConnected(data = layer1, num_hidden = 10)
output = mx.sym.SoftmaxOutput(data = layer2, name = 'softmax')

Getting our toy mnist data, and training the model

mnist = mx.test_utils.get_mnist()
train_iter = mx.io.NDArrayIter(mnist['train_data'], mnist['train_label'], 128, shuffle = True)
# create a module
module = mx.mod.Module(symbol = output,
                       context = mx.gpu(), # change to mx.cpu() if you don't have gpu
                       data_names = ['data'],
                       label_names = ['softmax_label'])
# fit the module
module.fit(train_iter,
           optimizer = 'sgd',
           optimizer_params = {'learning_rate':0.1},
           num_epoch = 5)

Will print

INFO:root:Epoch[0] Train-accuracy=0.779351
INFO:root:Epoch[0] Time cost=0.674
INFO:root:Epoch[1] Train-accuracy=0.908615
INFO:root:Epoch[1] Time cost=0.636
INFO:root:Epoch[2] Train-accuracy=0.924257
INFO:root:Epoch[2] Time cost=0.789
INFO:root:Epoch[3] Train-accuracy=0.934768
INFO:root:Epoch[3] Time cost=0.792
INFO:root:Epoch[4] Train-accuracy=0.943230
INFO:root:Epoch[4] Time cost=0.800

Though I wouldn’t recommend you to use such practices as I have shown above.

Mixing Gluon and Symbol is not a recommended way to build and train your model.
And as far I know there is just literally nothing that you can’t do with Gluon. And as a personal suggestion I’d suggest you to use Gluon, its lot more flexible and easy to write and debug.

Hope this helps.

1 Like

@ mouryarishik’s answer is right, but you don’t need to call hybridize on the block (though it does need to be a HybridBlock).

A HybridBlock is designed to work with two kinds of inputs: symbols, or ndarrays. Consequently, if you push a symbol through, you’ll get a symbol out. If you push an ndarray through, you’lld get an ndarray out.

Simple example:

layer = mx.gluon.nn.Dense(5)
layer.initialize()
x_sym = mx.sym.var('x')
x_nd = mx.nd.ones((1, 3))

print(layer(x_sym))  # get back a symbol
print(layer(x_nd))  # get back an ndarray

The hybridize method serves a different purpose.

First, lets note that HybridBlock objects do some special magic behind the scenes. When you push ndarrays through your block for the first time, HybridBlock will actually make variable symbols for each input and then call itself on those symbols, which will yield the output symbol/graph (just like we saw in the above example). It will then store a cache of that graph internally. After storing this cached graph, it will will then just call hybrid_forward on your actual data as you would expect.

Then, when you call hybridize, the block does something sneaky. It builds an internal executor for that cached graph it made earlier. After that, anytime you call that block it won’t run your hybrid_forward method, it will instead run the more efficient executor for the cached graph on the input.

Here’s a test to let you peer inside and see that after you push data through the block, it will have stealthily made an internal graph representation of your layer.

layer2 = mx.gluon.nn.Dense(5)
layer2.initialize()

print(layer2._cached_graph)  # you'll get an empty tuple because no internal graph was made!

layer2(mx.nd.ones((1, 3)))
print(layer2._cached_graph)  # now you'll see it stored a graph!

But if you just want a symbol out, you don’t actually have to call hybridize; just push a symbol through it. Hybridize only affects how it processes actual ndarray data.

1 Like

Got it. Thanks for another explanation.

1 Like