Gluon: access layer weights

also referring to Multiple losses
it turned out that given the following loss function (I still don’t get why in MXNet the losses return from hybrid_forward have size (batch_size,) instead of a scalar loss)

class SomeLoss(mx.gluon.loss.Loss):
def __init__(self, weight=1., batch_axis=0, **kwargs):
    super(SomeLoss, self).__init__(weight=weight, batch_axis=batch_axis, **kwargs)

def hybrid_forward(self, F, x, sample_weight=None):
    y = F.sign(data=x)
    b_n = 0.5 * (y + 1)
    mu_m = F.mean(b_n, axis=0)
    loss = F.square(mu_m - 0.5)
    return loss

the gradients do not backpropagate correctly. Which is the reason that the weights of the network are not updated!
I further found out that this is related to the fact that x is not used inside the operators, except for F.sign(…), but this function is non-differentiable at x=0 and zero everywhere else.
As a solution we could approximate this with F.sigmoid/F.tanh, but I still wonder why the backend cannot handle this since for this loss:

class OtherLoss(mx.gluon.loss.Loss):
def __init__(self, weight=1., batch_axis=0, **kwargs):
    super(OtherLoss, self).__init__(weight=weight, batch_axis=batch_axis, **kwargs)

def hybrid_forward(self, F, x, sample_weight=None):
    y = F.sign(data=x)
    b_n = 0.5 * (y + 1)
    loss = F.square(b_n - x)
    loss = F.mean(loss, axis=0, exclude=True)
    return loss

the gradients are calculated and the weights updated.