Backward of mxnet's network with BatchNorm doesn't have gradient in input layer but has gradient without BatchNorm

When we use mxnet’s BatchNorm layer, input layer doesn’t have gradient in backward.However,it has gradient while network is created without BatchNorm layer.

x = mx.symbol.FullyConnected(data=x, num_hidden=256, name='gx')
y = mx.symbol.FullyConnected(data=x, num_hidden=256, name='gx0')

Forward parameter: is_train=False The input layer has gradient.

x = mx.symbol.FullyConnected(data=x, num_hidden=256, name='gx')
y = mx.symbol.FullyConnected(data=x, num_hidden=256, name='gx0')

Forward parameter: is_train=True
The input layer has gradient.

x = mx.symbol.FullyConnected(data=x, num_hidden=256, name='gx')
y = mx.sym.BatchNorm(data=x, fix_gamma=False, eps=2e-5, momentum=0.9, name='gx0')

Forward parameter: is_train=False

The input layer doesn’t have gradient.

x = mx.symbol.FullyConnected(data=x, num_hidden=256, name='gx')
y = mx.sym.BatchNorm(data=x, fix_gamma=True, eps=2e-5, momentum=0.9, name='gx0')

Forward parameter: is_train=False

The input layer doesn’t have gradient.

x = mx.symbol.FullyConnected(data=x, num_hidden=256, name='gx')
y = mx.sym.BatchNorm(data=x, fix_gamma=True, eps=2e-5, momentum=0.9, name='gx0')

Forward parameter: is_train=True
The input layer doesn’t have gradient

x = mx.symbol.FullyConnected(data=x, num_hidden=256, name='gx')
y = mx.sym.BatchNorm(data=x, fix_gamma=False, eps=2e-5, momentum=0.9, name='gx0')

Forward parameter: is_train=True

The input layer doesn’t have gradient

There are gradients in these networks.

x = mx.symbol.FullyConnected(data=x, num_hidden=256, name='gx')
y = mx.symbol.FullyConnected(data=x, num_hidden=256, name='gx0')

Forward parameter: is_train=False
The input layer has gradient.

x = mx.symbol.FullyConnected(data=x, num_hidden=256, name='gx')
y = mx.symbol.FullyConnected(data=x, num_hidden=256, name='gx0')

Forward parameter: is_train=True The input layer has gradient

This looks more like a MXNET question. Could you tag it with MXNet?