How to train a model with huge classes

HaoLiuHust · December 18, 2018, 1:13am

for example, I am training a face recognition model with millions ids, beside use tripletloss, I would like to use softmax-based losses such as arcloss, amsoftmax and so on. However, with such huge classes, gpu meomery will be insufficient, is there a way that I can train a model like this? Maybe split the softmax layer on multi gpus would be work, I wonder whether mxnet support this

NRauschmayr · December 19, 2018, 1:43pm

Computing the softmax on millions of classes is very expensive. You could use a sampled softmax loss instead. This will only take into account a subset of classes in the loss. Here is a nice article about how to optimize softmax: http://ruder.io/word-embeddings-softmax/

ThomasDelteil · December 19, 2018, 2:09pm

You can have a look at the sampled blocks in gluon-nlp package:

github.com

dmlc/gluon-nlp/blob/master/src/gluonnlp/model/sampled_block.py

# coding: utf-8

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.

"""Blocks for sampled losses."""

This file has been truncated. show original

Topic		Replies	Views
Bert Transfer Learning Gluon	5	389	October 23, 2019
GPU memory usage	18	4623	November 23, 2017
How to speed up the train of neural network model with mxnet? Performance	12	3077	August 10, 2018
Pre-requisites for dist training "linearity"? Gluon	5	461	December 8, 2018
Correct way to train Sequential() model on GPU Gluon	6	1161	February 10, 2021

How to train a model with huge classes

Related Topics