How to train a model with huge classes

for example, I am training a face recognition model with millions ids, beside use tripletloss, I would like to use softmax-based losses such as arcloss, amsoftmax and so on. However, with such huge classes, gpu meomery will be insufficient, is there a way that I can train a model like this? Maybe split the softmax layer on multi gpus would be work, I wonder whether mxnet support this

Computing the softmax on millions of classes is very expensive. You could use a sampled softmax loss instead. This will only take into account a subset of classes in the loss. Here is a nice article about how to optimize softmax: http://ruder.io/word-embeddings-softmax/

1 Like

You can have a look at the sampled blocks in gluon-nlp package:

1 Like