Keep predicted values constant in some terms of the loss function

cangerer · May 17, 2019, 12:20pm

Hi,

I have a Gluon network that predicts a matrix from my input. I want to define a loss function like this:

for every even column, use the L2 loss of the absolute values. l2_loss(pred[0::2,:],true[0::2,:])
for every odd column I want use the L2 loss of the difference between the even column next to it: l2_loss(pred[1::2,:] - pred[0::2,:], true[1::2,:] - true[0::2,:])

If I put this together, I get something like this: lambda l, p: l2_loss(pred[:,0::2,:], true[:,0::2,:]) + l2_loss(pred[:,1::2,:]-pred[:,0::2,:],true[:,1::2,:]-true[:,0::2,:])

However, what I want to achieve is that the -pred[:,0::2,:] part in the loss for the odd columns is considered constant, that is I do not want to back-propagate the error from the second term into the even columns. I hope this explanation makes sense.

Any idea how I can achieve that?

Edit: I am using a copy() on the -pred[:,0::2,:] term, which trains. But I don’t know if that will do what I expect? lambda l, p: l2_loss(pred[:,0::2,:], true[:,0::2,:]) + l2_loss(pred[:,1::2,:]-pred.copy()[:,0::2,:],true[:,1::2,:]-true[:,0::2,:])

Edit 2: Just to be on the super safe side I am now also doing a detach(): lambda l, p: l2_loss(pred[:,0::2,:], true[:,0::2,:]) + l2_loss(pred[:,1::2,:]-pred[:,0::2,:].copy().detach(),true[:,1::2,:]-true[:,0::2,:]) but I would sill like to understand what’s really needed to make this work and if the copy().detach() is doing what I hope or if it’s overkill or wrong…

thomelane · May 22, 2019, 11:33pm

Hi @cangerer,

Yes, detach should be sufficient here. Just tested with this example:

import mxnet as mx

x = mx.nd.random.uniform(shape=(5,3))
w = mx.nd.random.uniform(shape=(3,4))
z_t = mx.nd.random.uniform(shape=(5,2,2))
w.attach_grad()

with mx.autograd.record():
    y = mx.nd.dot(x, w)
    z = y.reshape(-1,2,2)
    loss_even = ((z[:,:,0::2] - z_t[:,:,0::2])**2).sum()
    loss_odd = (((z[:,:,1::2]-z.detach()[:,:,0::2]) - (z_t[:,:,1::2]-z_t[:,:,0::2]))**2).sum()
    loss = loss_even + loss_odd
    #loss = loss_even
    #loss = loss_odd
loss.backward()
print(w.grad)

Uncomment the different loss variables to see the difference. As an example with loss = loss_odd, you get only gradients on the odd weights.

[[ 0.        -8.344377   0.        -0.5854652]
 [ 0.        -7.372945   0.        -0.7097787]
 [ 0.        -5.1953363  0.        -0.5799739]]
<NDArray 3x4 @cpu(0)>

Also, you should also double check your indexing if you want columns rather than rows. Check [:,:,0::2] vs [:,0::2,:].

cangerer · May 24, 2019, 10:13am

Great, thanks for checking @thomelane !

Topic		Replies	Views
Gradient nan when using 2-norm in lstm network Gluon	0	392	August 16, 2019
Custom loss function from a pre-trained network Discussion	2	830	March 23, 2018
Is there a way or technique to add a "correct-no-matter-what" label? Discussion	4	332	June 29, 2018
Difficulties with recurrent network Gluon	0	439	August 11, 2020
Multiple output layers and multiple losses handling Discussion	2	1343	June 13, 2018

Keep predicted values constant in some terms of the loss function

Related Topics