We have an implementation of a recurrent network in MXnet and are trying to obtain the second order derivatives of a loss function with respect to arbitrary(all) parameters in the computational graph.
It’s unclear to me how many operators support higher-order gradients at this time, so it might not work on your network, but there is an interface that should allow you to do it provided all the operators support it.
See:
mxnet.autograd.grad
You can find documentation for it on this page, but you have to scroll down because for some reason, there isn’t an anchor link for it at the top.
Gist should be something like this (I didn’t test this)
with mx.autograd.record():
output = net(x)
loss = loss_func(output)
dz = mx.autograd.grad(loss, [z], create_graph=True) # where [z] is the parameter(s) you want
dz[0].backward() # now the actual parameters should have second order gradients