Hello,
We are implementing a custom loss layer using mx.symbol.Custom
. This works fine. We want to add L2 regularization to this loss. i.e.
loss = mx.symbol.Custom(.....)
l2_loss = mx.sym.sum(mx.sym.square(var))
reg_loss = mx.sym.MakeLoss(l2_loss * (reg_weight))
Now, Ideally I want to add both the loss.
Would the following code add both losses during training? or do I need to specifically add them?
final_loss = mx.sym.Group([loss, reg_loss])
Thanks!
What you have done here should work from what I can see. Have you tried?
Yes, I have tried. There are no errors as such. It trains fine.
But, am not sure if it is adding up the losses before taking the gradients.
Because, I am comparing the results between MXNet and TensorFlow and they are different.
They get added, here is a toy example, where I have grouped two symbols: add and mult. I pass data through them and get the gradient back:
# input
a = mx.sym.Variable('a')
b = mx.sym.Variable('b')
ā
# ops
add = a + b
mult = a * b
ā
# output
out = mx.sym.Group([add, mult])
ā
# bind shapes, get executor
executor = out.simple_bind(mx.cpu(), a=(1,3), b=(1,3))
ā
# data
a_data = mx.nd.array([[1,2,3]])
b_data = mx.nd.array([[3,4,5]])
ā
# Forward pass
output = executor.forward(a=a_data, b=b_data, is_train=True)
ā
# Backward pass
head_grad = mx.nd.ones((1,3))
executor.backward([head_grad, head_grad])
ā
print(executor.arg_dict)
print(executor.grad_arrays)
{'a':
[[1. 2. 3.]]
<NDArray 1x3 @cpu(0)>, 'b':
[[3. 4. 5.]]
<NDArray 1x3 @cpu(0)>}
[
[[4. 5. 6.]]
<NDArray 1x3 @cpu(0)>,
[[2. 3. 4.]]
<NDArray 1x3 @cpu(0)>]
We have:
d(add)/da = 1
d(mult)/da = b
d(add)/db = 1
d(mult)/db = a
We see that effectively our grad array for a
is:
[1+3, 1+4, 1+5] = [4, 5, 6]
and for b
is:
[1+1, 1+2, 1+3] = [2, 3, 4]
3 Likes