Since I have 7 classes, I want to apply the softmax activation in the output layer, but I get the following error: RuntimeError: simple_bind error. Arguments: /input_11: (8, 1, 64, 64, 64) /softmax_1_target1: (8, 7, 64, 64, 64) /softmax_1_sample_weights1: (8,) Error in operator broadcast_mul14: [16:48:16] c:\jenkins\workspace\mxnet-tag\mxnet\src\operator\tensor\./elemwise_binary_broadcast_op.h:68: Check failed: l == 1 || r == 1: operands could not be broadcast together with shapes [8,7,64,64] [8]
My input_shape is (1,64,64,64) and n_labels is 7. I have tried several options, all with the same result: activation_block = Activation(activation_name)(output_layer) activation_block = Conv3D(n_labels, kernel_size=(1,1,1), activation="softmax")(output_layer) activation_block = Softmax(axis=1)(output_layer)
where the shape of the output_layer is (None, 7, 64, 64, 64).
Putting activation_name to “sigmoid” does work, but seems less logical to me since I have a multiclass problem.
you should add a dense layer first, our specific the axis applying softmax
softmax thought that your label’s shape is 8764*64, and label in range(64) is True, which is not what you really want.
using reshape, specific the axis applying softmax, or manually add a dense layer may help.
(I use keras only with tensorflow/CNTK as a backend, for MXNet, I just use the gluon model)
(So I am not shure the strategy I told you really works)
Thank you for the reply! This is a fully convolutional network which should output volumetric predictions of shape (7,64,64,64), so I don’t really want to use a dense layer (which will flatten my output). The error stays the same when I specify the axis to apply the softmax: activation_block = Softmax(axis=1)(output_layer)
I can compile the network, but it gives the error message mentioned above when fitting.
I have just discovered that activation_block = Softmax(axis=1)(output_layer) does work in combination with my custom Dice loss function, but that the error occurs when adding K.categorical_crossentropy(y_true, y_pred) to my Dice loss.
In the definition of my loss function, I can either use: return 1 - dice
or return K.categorical_crossentropy(K.reshape(y_true, (y_true.shape[0], y_true.shape[1], 64*64*64)), K.reshape(y_pred, (y_pred.shape[0], y_pred.shape[1], 64*64*64)))
what about reshape before Softmax?
AFAIK categorical_crossentropy works for y_pred=(batch_size,label_size) and y_true=(batch_size,1)
so a reshape function transform y_pred=(1,64,64,64,7) to y_pred=(1*64*64*64,7) may help
maybe you could try activation_block = Softmax()(output_layer.reshape((-1,7))) with label=label.reshape((-1,))
First of all, I don’t really want to reshape my output before the activation function, since I need volumetric data. I guess it’s also not really necessary, since I can use activation_block = Softmax(axis=1)(output_layer) in combination with the categorical_crossentropy loss.
However, I have noticed that there is a huge difference in loss value if I use K.categorical_crossentropy(y_true, y_pred) (initial values around 30)
I always presumed that categorical_crossentropy returns a value, not a tensor, such that it could be added to 1-dice. Apparently, this is not the case. Is there a way to solve this?
This did indeed fix my problem, thank you very much! So this is the loss function that I’m using now: 1 - dice + K.mean(K.categorical_crossentropy(y_true, y_pred, axis=1))
Just out of curiosity: how does Keras handle the categorical_crossentropy usually if it returns a tensor? Does it internally calculate the mean as loss value?
I think MXNet try to figure in_grad and out_grad respectively.
So if you send a tensor of length n, MXNet may give you a gradient of all the n results.
MXNet will handle the rest of the part. So you won’t worry what will happen if you send a tensor.