When we use mxnet’s BatchNorm layer, input layer doesn’t have gradient in backward.However,it has gradient while network is created without BatchNorm layer.
x = mx.symbol.FullyConnected(data=x, num_hidden=256, name='gx')
y = mx.symbol.FullyConnected(data=x, num_hidden=256, name='gx0')
Forward parameter: is_train=False
The input layer has gradient.
x = mx.symbol.FullyConnected(data=x, num_hidden=256, name='gx')
y = mx.symbol.FullyConnected(data=x, num_hidden=256, name='gx0')
Forward parameter: is_train=True
The input layer has gradient.
x = mx.symbol.FullyConnected(data=x, num_hidden=256, name='gx')
y = mx.sym.BatchNorm(data=x, fix_gamma=False, eps=2e-5, momentum=0.9, name='gx0')
Forward parameter: is_train=False
The input layer doesn’t have gradient.
x = mx.symbol.FullyConnected(data=x, num_hidden=256, name='gx')
y = mx.sym.BatchNorm(data=x, fix_gamma=True, eps=2e-5, momentum=0.9, name='gx0')
Forward parameter: is_train=False
The input layer doesn’t have gradient.
x = mx.symbol.FullyConnected(data=x, num_hidden=256, name='gx')
y = mx.sym.BatchNorm(data=x, fix_gamma=True, eps=2e-5, momentum=0.9, name='gx0')
Forward parameter: is_train=True
The input layer doesn’t have gradient
x = mx.symbol.FullyConnected(data=x, num_hidden=256, name='gx')
y = mx.sym.BatchNorm(data=x, fix_gamma=False, eps=2e-5, momentum=0.9, name='gx0')
Forward parameter: is_train=True
The input layer doesn’t have gradient
There are gradients in these networks.
x = mx.symbol.FullyConnected(data=x, num_hidden=256, name='gx')
y = mx.symbol.FullyConnected(data=x, num_hidden=256, name='gx0')
Forward parameter: is_train=False
The input layer has gradient.
x = mx.symbol.FullyConnected(data=x, num_hidden=256, name='gx')
y = mx.symbol.FullyConnected(data=x, num_hidden=256, name='gx0')
Forward parameter: is_train=True
The input layer has gradient