Dear all,

I’m working on the implementation of WGAN with gradient penalty, but I get an error when doing a backward step:

```
MXNetError: Operator _backward_Convolution is non-differentiable because it didn't register FGradient attribute.
```

I guess it goes wrong because of the computation of \left(|| \nabla net_c(x_m)||_2 - 1 \right)^2 term inside the loss function. If I remove this term from the loss function, the training loop works again. Did I do something wrong in the implementation with Gluon? What is the proper way to compute a loss function with second order derivative?

I made a minimal example to reproduce the error:

```
import mxnet as mx
from mxnet import nd, gluon, autograd
from mxnet.gluon import nn
# Define and init dummy network.
net = nn.HybridSequential()
net.add(
nn.Conv2D(in_channels=1, channels=64, kernel_size=4, strides=2, activation="relu"),
nn.Conv2D(in_channels=64, channels=128, kernel_size=4, strides=2, activation="relu"),
nn.Conv2D(in_channels=128, channels=1, kernel_size=4, strides=2)
)
net.initialize()
trainer = gluon.Trainer(net.collect_params(), "adam", {"learning_rate": 0.00002})
batch_size = 8
clambda = 10
# Do one training step
with autograd.record():
xr = nd.random.randn(batch_size, 1, 28, 28)
xf = nd.random.randn(batch_size, 1, 28, 28)
epsilon = nd.ones(shape=(batch_size, 1, 1, 1)) * 0.5
xm = epsilon * xr + (1 - epsilon) * xf
xm.attach_grad()
yr = net(xr)
ym = net(xm)
grad_ym = mx.autograd.grad(heads=ym, variables=[xm], retain_graph=True, create_graph=True)[0]
grad_ym = grad_ym.reshape(batch_size, -1)
loss = nd.mean(ym) - nd.mean(yr) + clambda * nd.mean((nd.norm(grad_ym, axis=1) - 1) ** 2)
print("loss: ", loss)
loss.backward()
trainer.step(batch_size)
```