Maximizing with a non-standard probability model

I have a probability model where P(y = 1) = sigmoid( -|| xY ||^2 + c ). I figured I could pretty easily use MXNet to run the maximum likelihood optimization, since it’s essentially logistic regression, except instead of a dot product, you have something like Mahanalobis distance (trying to optimize parameters Y and c given data x,y). I tried something like:

data = mx.sym.Variable(“data”)
target = mx.sym.Variable(“target”)
Y = mx.sym.Variable(“weight”)
fc = mx.sym.FullyConnected(data=data, weight=Y, no_bias=True, num_hidden=10)
bias = mx.sym.Variable(“bias”)
norm = -mx.sym.sum(mx.sym.square(fc)) + bias
out = mx.sym.LogisticRegressionOutput(data=norm, label=target)
model = mx.mod.Module(symbol=out, data_names=[‘data’], label_names=[‘target’])

but there are two problems. (1) I cannot use a batch size greater than 1, even if I force bias to have shape (1,) and (2) after training for any number of epochs, the resulting weights are all nan. Am I making a simple mistake here? Is it possible to use MXNet to optimize a function like this?

Can you post the error message for (1)? Did you have a infer_shape error?