Custom Loss function is making gradients to fail


I did my custom loss like the following and everything works fine:

def loss(output: Symbol, label: Symbol): Symbol = {
Symbol.square()()(Map(“data” -> (label - output)))

But I want to add a mask to the loss function:
def loss(output: Symbol, label: Symbol): Symbol = {
val zeros = Symbol.zeros(Shape(batch_size,1))
val mask = Symbol.broadcast_greater()()(Map(“lhs” -> label, “rhs” ->zeros))
val mse = Symbol.square()()(Map(“data” -> (label - output)))
val mul = Symbol.broadcast_mul()()(Map(“lhs” -> mask, “rhs” -> mse))
val sum = Symbol.sum_axis()()(Map(“data” -> mask, “axis” -> 1,“keepdims” -> 1))
val mmse = Symbol.broadcast_div()()(Map(“lhs” -> mul, “rhs” -> sum ))

With this loss function my forward and backward are working but the optimizer is failing to update the weights. The error I am getting is : Incompatible attr in node at 1-th input: expected [M,M], got [M]. As per my understanding both function should return a shape(batch_size, M). What am I doing wrong here? Any clue?

Thanks in advance,