also referring to Multiple losses
it turned out that given the following loss function (I still don’t get why in MXNet the losses return from hybrid_forward have size (batch_size,) instead of a scalar loss)
class SomeLoss(mx.gluon.loss.Loss):
def __init__(self, weight=1., batch_axis=0, **kwargs):
super(SomeLoss, self).__init__(weight=weight, batch_axis=batch_axis, **kwargs)
def hybrid_forward(self, F, x, sample_weight=None):
y = F.sign(data=x)
b_n = 0.5 * (y + 1)
mu_m = F.mean(b_n, axis=0)
loss = F.square(mu_m - 0.5)
return loss
the gradients do not backpropagate correctly. Which is the reason that the weights of the network are not updated!
I further found out that this is related to the fact that x is not used inside the operators, except for F.sign(…), but this function is non-differentiable at x=0 and zero everywhere else.
As a solution we could approximate this with F.sigmoid/F.tanh, but I still wonder why the backend cannot handle this since for this loss:
class OtherLoss(mx.gluon.loss.Loss):
def __init__(self, weight=1., batch_axis=0, **kwargs):
super(OtherLoss, self).__init__(weight=weight, batch_axis=batch_axis, **kwargs)
def hybrid_forward(self, F, x, sample_weight=None):
y = F.sign(data=x)
b_n = 0.5 * (y + 1)
loss = F.square(b_n - x)
loss = F.mean(loss, axis=0, exclude=True)
return loss
the gradients are calculated and the weights updated.