This is probably a silly question but I’m having a difficult time learning the autograd API and the Symbol API. I can’t seem to figure out how to compute the gradient of a function when the function is using symbols and not NDArrays. For example:
x_in = mx.nd.array([2])
x_in.attach_grad()
X = mx.sym.Variable("X")
with autograd.record():
F = X * X
execute = F.bind(ctx=mx.cpu(0),args={'X' : x_in})
out = execute.forward()
grad = autograd.grad(out[0], [x])
This code gives an error: “Cannot differentiate node because it is not in a computational graph.”
I feel like I’m missing some sort of fundamental information about how the autograd api and symbol api work, but I can’t seem to find examples of gradients being calculated with symbols.
You don’t need to explicitly use autograd when using symbol. Set the is_train argument to true on your forward pass and the information will be kept in order to do a backward pass and get back the computed gradients.
You need to allocate the memory for your gradients through the args_grad argument.
If you want to use symbols, the Module API is good at hiding these low-level details from you.
Otherwise I would suggest to use Gluon
x_in = mx.nd.array([2])
X = mx.sym.Variable("X")
F = X * X
executor = F.bind(ctx=mx.cpu(0),args={"X" : x_in}, args_grad= {"X": mx.nd.zeros((1))})
out = executor.forward(is_train=True).copy()
execute.backward(out)
print(execute.grad_arrays)
Is it possible to compute the hessian or other higher order gradients with this method? For example, is there a symbolic gradient operator that you could compute the gradient of?
I’ve seen this issue but can’t figure out what do I need to do. It confuses me further because GradBlock is supposed to block any gradient computation of that layer, so how am I supposed to get intermediate gradients from that?? That really doesn’t make any sense.