I am a bit confused about which labels MXNet is expecting in a binary classification context.
In my problem, I have a dependent variable which looks like an array of 1s and 0s i.e. [1,0,0,0,1,1….0,0,0,1]
.
In numpy terms, its shape is (n_data_points,).
Given that, my last 2 layers in the model are defined as follows:
fc2 = mx.symbol.FullyConnected(data = fc1bn, name=‘fc2’, num_hidden=1)
mlp = mx.symbol.LogisticRegressionOutput(data = fc2, name = ‘softmax’)
This works perfectly.
Thing is, this works as well
fc2 = mx.symbol.FullyConnected(data = fc1bn, name=‘fc2’, num_hidden=2)
mlp = mx.symbol.SoftmaxOutput(data = fc2, name = ‘softmax’)
whilst I would have expected the above to work only if the dependent variable was one-hot-encoded, i.e. [[1,0],[0,1],[0,1],[0,1],[1,0],…[0,1],[1,0]]
, or again, in numpy terms, shaped as (n_data_points,2).
Apparently SoftmaxOutput
is smart enough to spit out a probability and return argmax at the same time.
Now, the question is, is there a recommended way of structuring a binary classification problem?
Shall one use a one-hot-encoded variable or not?
Knowing that LogisticRegressionOutput
and SoftmaxOutput
do exactly the same thing in a binary context, which one is recommended?