Keep predicted values constant in some terms of the loss function


I have a Gluon network that predicts a matrix from my input. I want to define a loss function like this:

  • for every even column, use the L2 loss of the absolute values. l2_loss(pred[0::2,:],true[0::2,:])
  • for every odd column I want use the L2 loss of the difference between the even column next to it: l2_loss(pred[1::2,:] - pred[0::2,:], true[1::2,:] - true[0::2,:])

If I put this together, I get something like this: lambda l, p: l2_loss(pred[:,0::2,:], true[:,0::2,:]) + l2_loss(pred[:,1::2,:]-pred[:,0::2,:],true[:,1::2,:]-true[:,0::2,:])

However, what I want to achieve is that the -pred[:,0::2,:] part in the loss for the odd columns is considered constant, that is I do not want to back-propagate the error from the second term into the even columns. I hope this explanation makes sense.

Any idea how I can achieve that?

Edit: I am using a copy() on the -pred[:,0::2,:] term, which trains. But I don’t know if that will do what I expect? lambda l, p: l2_loss(pred[:,0::2,:], true[:,0::2,:]) + l2_loss(pred[:,1::2,:]-pred.copy()[:,0::2,:],true[:,1::2,:]-true[:,0::2,:])

Edit 2: Just to be on the super safe side I am now also doing a detach(): lambda l, p: l2_loss(pred[:,0::2,:], true[:,0::2,:]) + l2_loss(pred[:,1::2,:]-pred[:,0::2,:].copy().detach(),true[:,1::2,:]-true[:,0::2,:]) but I would sill like to understand what’s really needed to make this work and if the copy().detach() is doing what I hope or if it’s overkill or wrong…

Hi @cangerer,

Yes, detach should be sufficient here. Just tested with this example:

import mxnet as mx

x = mx.nd.random.uniform(shape=(5,3))
w = mx.nd.random.uniform(shape=(3,4))
z_t = mx.nd.random.uniform(shape=(5,2,2))
with mx.autograd.record():
    y =, w)
    z = y.reshape(-1,2,2)
    loss_even = ((z[:,:,0::2] - z_t[:,:,0::2])**2).sum()
    loss_odd = (((z[:,:,1::2]-z.detach()[:,:,0::2]) - (z_t[:,:,1::2]-z_t[:,:,0::2]))**2).sum()
    loss = loss_even + loss_odd
    #loss = loss_even
    #loss = loss_odd

Uncomment the different loss variables to see the difference. As an example with loss = loss_odd, you get only gradients on the odd weights.

[[ 0.        -8.344377   0.        -0.5854652]
 [ 0.        -7.372945   0.        -0.7097787]
 [ 0.        -5.1953363  0.        -0.5799739]]
<NDArray 3x4 @cpu(0)>

Also, you should also double check your indexing if you want columns rather than rows. Check [:,:,0::2] vs [:,0::2,:].

Great, thanks for checking @thomelane !