Gluoncv SSD returns mxnet.base.MXNetError: Shape inconsistent between 2 epochs

olivcruche · October 20, 2019, 5:05pm

Hi,
I’m training a gluoncv SSD. A very weird thing happens:
first epoch works fine
first batch of the second epoch returns:

Traceback (most recent call last):
  File "trash.py", line 308, in <module>
    sum_loss, cls_loss, box_loss = mbox_loss(cls_preds, box_preds, C, B)
  File "/home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/block.py", line 548, in __call__
    out = self.forward(*args)
  File "/home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/gluoncv/loss.py", line 156, in forward
    cls_loss = -nd.pick(pred, ct, axis=-1, keepdims=False)
  File "<string>", line 89, in pick
  File "/home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/_ctypes/ndarray.py", line 92, in _imperative_invoke
    ctypes.byref(out_stypes)))
  File "/home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/base.py", line 253, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: Shape inconsistent, Provided = [10,6132], inferred shape=[10,100]

Where does that inferred shape=[10,100] comes from? Why would this happen between 2 epochs while the whole first epoch went fine??

olivcruche · October 20, 2019, 5:46pm

wow how weird - I moved the net.hybridize(static_alloc=True, static_shape=True) to inside the epoch loop (instead of doing it just once, out of the epoch loop) and the error disappeared

ThomasDelteil · October 23, 2019, 12:00am

I think this happened because you hybridized before you got the anchors. You need to hybridize the model AFTER you request the anchors. The call to get the anchors under the training scope of autograd triggers a different branch of the ssd model that returns the anchors. The anchors are used to compute the targets on CPU ahead of times, since they are deterministic based on the target and the anchors sizes.

Topic		Replies	Views
Try to build MobilenetSSD300 got error: mxnet.base.MXNetError: Shape inconsistent, Provided = [32,1917], inferred shape=[32,2781] Gluon	1	1779	December 24, 2018
Getting error in gluon model after training for 20 epochs Gluon	2	1161	August 15, 2018
Type inconsistency (using gluon pretrained model) Gluon	5	599	November 9, 2018
Infer_shape error when trying to feed images to my network Gluon	1	1803	January 2, 2020
Cryptic failure of SSD training with gluoncv 0.5.0 Gluon	1	503	October 23, 2019

Gluoncv SSD returns mxnet.base.MXNetError: Shape inconsistent between 2 epochs

Related Topics