Why from_logits is False by default in gluon SoftmaxCrossEntropy?

I know that SoftmaxCrossEntropyLoss(from_logits = False) does apply softmax to our linear layer and then computes the cross-entropy loss, which perfectly makes sense. And if we want to pass a softmax applied layer, we can set from_logits = True, and everything would work fine.

My question is why from_logits is False by default? Wouldn’t it be better if it’s True by default so that every time I make a prediction(using model having softmax layer as output layer), it provides probabilities instead of linear predictions?

I know I can do this by just setting from_logits = True in the loss and then train it, but is there any important reason behind? If any?

BTW Pytorch also does the same as MXNet. So there’s gotta be a legit and intuitive reason.

Typically, if you’re doing a prediction with a model that has a softmax layer as an output layer, computing the actual softmax probabilities at inference time doesn’t really matter if you’re going to end up taking argmax afterwards so it’s just wasted computation. And since you don’t use the loss at inference time but you need the softmax for the loss, it makes sense to have the softmax computation actually happen within the loss at training.

However, if you do need to explicitly make softmax part of your model for inference. i.e you won’t just take argmax of the output and you need your outputs to be strictly between 0 and 1 and normalized then you can use the from_logits=True argument and that should work.

This is exactly what I was thinking of, thanks.
There is another reason I can think of:
Suppose you want to load a pre-trained model and wanna add more layers to it for Transfer learning, then if your loaded model has softmax as output layer then it might be a bit difficult to add further layers, whereas if we have a linear layer as output layer then we easily build more layers on top of it.

Thanks for your time.