How to get gradients using symbol API

This is probably a silly question but I’m having a difficult time learning the autograd API and the Symbol API. I can’t seem to figure out how to compute the gradient of a function when the function is using symbols and not NDArrays. For example:

x_in = mx.nd.array([2])

X = mx.sym.Variable("X")

with autograd.record():
    F = X * X
    execute = F.bind(ctx=mx.cpu(0),args={'X' : x_in})
    out = execute.forward()
    grad = autograd.grad(out[0], [x])

This code gives an error: “Cannot differentiate node because it is not in a computational graph.”

I feel like I’m missing some sort of fundamental information about how the autograd api and symbol api work, but I can’t seem to find examples of gradients being calculated with symbols.

Thanks for any help!

You don’t need to explicitly use autograd when using symbol. Set the is_train argument to true on your forward pass and the information will be kept in order to do a backward pass and get back the computed gradients.
You need to allocate the memory for your gradients through the args_grad argument.

If you want to use symbols, the Module API is good at hiding these low-level details from you.

Otherwise I would suggest to use Gluon :smile:

x_in = mx.nd.array([2])
X = mx.sym.Variable("X")
F = X * X
executor = F.bind(ctx=mx.cpu(0),args={"X" : x_in}, args_grad= {"X": mx.nd.zeros((1))})

out = executor.forward(is_train=True).copy()

[[16.]<NDArray 1 @cpu(0)>]

Is it possible to compute the hessian or other higher order gradients with this method? For example, is there a symbolic gradient operator that you could compute the gradient of?

@bschrift please see this thread for second order derivatives: Obtaining second order derivatives for a function wrt arbitrary parameters in the computation graph

How can we calculate gradients for the intermediate outputs using symbol?
for example:

a = mx.sym.var(name = 'a', shape = (1, 1), dtype = 'float64')
b = mx.sym.var(name = 'b', shape = (1, 1), dtype = 'float64')

c = mx.sym.broadcast_add(a, b, name = 'c')
d = mx.sym.make_loss(mx.sym.broadcast_add(c, b, name = 'd'))

bind = d.simple_bind(ctx = mx.cpu(1))
bind.forward(a = mx.nd.ones((1, 1)), b = mx.nd.ones((1, 1)), grad_req = {'a': 'write', 'b':'write', 'c':'write'})
outputs = bind.outputs[0]
# print(outputs)
  # {'a': 
  # [[1.]]
  # <NDArray 1x1 @cpu(1)>, 
  # 'b': 
  # [[2.]]
  # <NDArray 1x1 @cpu(1)>}

So bind.grad_dict doesn’t provide gradients of c even though I’ve written 'c':write in grad_req.

So how can I get the gradients of “d with respect to c” as well?

I think you might be able to use mx.sym.GradBlock to get what you want.

See this github issue for more details

I’ve seen this issue but can’t figure out what do I need to do. It confuses me further because GradBlock is supposed to block any gradient computation of that layer, so how am I supposed to get intermediate gradients from that?? That really doesn’t make any sense.