Multi-label classification using mxnet

Dear experts,
I am trying to train a multi-label image classifier using mxnet/python interface. I am using MXnet module API and not gluon
I have 20 classes and each of these classes have 10 sub-classes. So I can frame the problem either as a 200-multi label (each binary [0 1]) or 20 multi-class softmax.
Now my question is which route is better? and is there an example/tutorial on this

I have found this which supports multi-label inputs:

However I still don’t know what loss function to use? Should I combine multiple cross-entropy losses?
Is there an example or tutorial on this?

I have also found this:

which is multilabel-softmax loss layer.

I would appreciate any input and sorry for duplicate entries.

The common assumption when using softmax classification is that classes are independent. So making it as a classification with 200 classes using softmax wouldn’t be suitable. I’d suggest trying 20 multiclass classification which ends up, for example in 20x10 matrix of labels/probabilities or learning class and subclasses together with response vector of length 30 where the first 20 do the classification, and the last 10 do the subclass classification (more like classification and localization in object recognition if you know what I mean).