Train a multi output classifier by gluoncv

I would like to train a multi label classifier by gluoncv, let me use examples to explain the results I want to achieve

input output_0:color output_1:type
img_0 red jeans
img_1 blue shirt

The model should not need each combination of categories I would like to predict.
Ex : Train with blue jeans, red shirt, but the model should be able to predict blue shirt.

The solutions I found are combine multiple loss criteria together

criterian1 = gloss.SoftmaxCrossEntropyLoss()
criterian2 = gloss.SoftmaxCrossEntropyLoss()
with autograd.record():
    outputs = model(input)
    loss1 = criterian1(outputs[0], targets[0])
    loss2 = criterian2(outputs[1], targets[1])

autograd.backward([loss1, loss2])

My questions are :

  1. Could I achieve the results by this solution?
  2. If I could, what kind of caveats I should pay attentions?
  3. What are the advantages/disadvantages compare with another solutions

a. multi-labelsolutions–Train the model with every combinations of categories I want to predict(ex : red jean, blue jean, red shirt, blue shirt)
b. Train two classifier for each output, in this case they will be color and type, I guess this solutions is the easiest to train but the slowest.


Edit : I confirm I could achieve the results by this solution, as long as I split the network to two branches.

Yes that is an apporach that would work. You can create a block that has 2 output layers that predict the cloth item and its color:

class MultiClass(gluon.Block):
    def __init__(self, **kwargs):
        super(MultiClass, self).__init__(**kwargs)
        with self.name_scope():
            self.clothes = gluon.nn.Dense(5)
            self.color   = gluon.nn.Dense(5)

    def forward(self, x):
        out1 = self.clothes(x)
        out2 = self.color(x)
        return (out1, out2)

Depending on the number of labels to predict, it may be more difficult to train the model. Since you only predict 2 labels, this will probably not be an issue. But you could try to train separate models, where each one only predicts a single label (assuming labels are mutually exclusive). If labels are correlated you could investigate models like classifier chains.

1 Like