I created a dummy Block
, which takes a 2D array, performs a 2D convolution, and feeds the convoluted output to a fully connected layer :
class DummyBlock(gluon.Block):
def __init__(self, **kwargs):
super(DummyBlock, self).__init__(**kwargs)
with self.name_scope():
self.conv = gluon.nn.Conv2D(channels=3, kernel_size=(1, 5), strides=(1, -1), activation='relu')
self.fc = gluon.nn.Dense(5)
def forward(self, x):
# 2D convolution: <NDArray 2x3x4x1 @cpu(0)>
x = self.conv(x)
x = self.fc(x)
return x
I tested DummyBlock
using the following code:
import numpy as np
import mxnet as mx
from mxnet import gluon, nd, autograd
X = nd.array([
[[1,0,0,0,0],[2,0,0,0,0],[3,0,0,0,0],[4,0,0,0,0]],
[[0,1,0,0,0],[0,2,0,0,0],[0,3,0,0,0],[0,4,0,0,0]],
[[0,0,1,0,0],[0,0,2,0,0],[0,0,3,0,0],[0,0,4,0,0]],
[[0,0,0,1,0],[0,0,0,2,0],[0,0,0,3,0],[0,0,0,4,0]],
[[0,0,0,0,1],[0,0,0,0,2],[0,0,0,0,3],[0,0,0,0,4]]
])
Y = nd.array([0,1,2,3,4])
ctx = mx.cpu()
net = DummyBlock()
net.collect_params().initialize(mx.init.Xavier(magnitude=2.24), ctx=ctx)
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.1})
batch_size = 2
loss_func = gluon.loss.SoftmaxCrossEntropyLoss()
data = gluon.data.DataLoader(gluon.data.ArrayDataset(X, Y), batch_size=batch_size)
for i, (data, label) in enumerate(data):
data = data.as_in_context(ctx)
data = data.reshape((0, 1, data.shape[1], data.shape[2]))
label = label.as_in_context(ctx)
with autograd.record():
output = net(data)
loss = loss_func(output, label)
loss.backward()
trainer.step(data.shape[0])
Besides the fact that it doesn’t do anything useful, this runs fine without any error. When I transpose x
and feed it into the fully connected layer:
def forward(self, x):
# 2D convolution: <NDArray 2x3x4x1 @cpu(0)>
x = self.conv(x)
# transpose: <NDArray 2x1x4x3 @cpu(0)>
x = nd.array([nd.transpose(a).asnumpy() for a in x])
x = self.fc(x)
return x
it fails after the first batch and gives the following error message:
Traceback (most recent call last):
File "/Users/jdchoi/workspace/elit/elit/component/postag.py", line 519, in <module>
trainer.step(data.shape[0])
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/mxnet/gluon/trainer.py", line 147, in step
%(param.name, str(data.context)))
UserWarning: Gradient of Parameter `dummyblock0_conv0_weight` on context cpu(0) has not been updated by backward since last `step`. This could mean a bug in your model that maked it only use a subset of the Parameters (Blocks) for this iteration. If you are intentionally only using a subset, call step with ignore_stale_grad=True to suppress this warning and skip updating of Parameters with stale gradient
In fact, it gives the same error message if I make a copy of x
and pass it to the fully connected layer:
def forward(self, x):
# 2D convolution: <NDArray 2x3x4x1 @cpu(0)>
x = self.conv(x)
x = x.copy()
x = self.fc(x)
return x
When I reshape x
and copy transposed values to x
, it runs fine:
def forward(self, x):
# 2D convolution: <NDArray 2x3x4x1 @cpu(0)>
x = self.conv(x)
# reshape and copy: <NDArray 2x1x4x3 @cpu(0)>
y = [nd.transpose(a).asnumpy() for a in x]
x = x.reshape((-1, 1, x.shape[2], x.shape[1]))
for i in range(len(x)): x[i] = y[i]
x = self.fc(x)
return x
This is very hacky and not efficient. Could someone explain to me why the first two approaches fail? I often need to transpose the output of the convolution (or even concatenate another vector with the output), and feed into the next layer, so it will be great to know if I could do with with Gluon
. Thank you.