Questions regarding AMP


I am trying to understand automatic mixed precision (AMP) and went through this link. I have a few doubts and it would be great if someone can help me clear them:

  • It seems that during training the model is initialized as fp32 and all the input / output are set as fp32. Amp seems to perform conversions to fp16 internally. Is it possible to specify to use fp16 input?

  • My goal is to move to fp16 for all but the last layer so that I can have around double the batch size. I checked the github code and the init method has options of providing which methods to keep in fp32 and which to cast to fp16. That should work for me. However, how do I specify Sigmoid activation here? Based on the tutorial it seems that it takes generic names like Convolution, SoftmaxOutput instead of the actual operator like conv2d. I am using nn.Activation('sigmoid') from gluon.

  • The inference section in the tutorial mentions that the model should be converted for inference. So, does that mean that I need to convert even when running validation after every epoch? Also, can the model be converted before training (for reducing model size) so that the training occurs only on the reduced model?


Hi @abhay, am not the expert on this, but from personal experience on points 2 and 3:

  • I recentrly tried to use AMP (with Horovod) on P100 GPUs, to reduce memory footprint. I did not manage to increase my batch size, although training became a bit faster (before AMP: batch size of 2, with AMP the same, but this is a very large model). This behaviour perhaps is linked to the specific hardware and with V100 to get better memory consumption.
  • The inference section, I think, refers to a model that is trained with float32 and you want to do inference on float16. If the model is already trained on float16, I think you do not need this conversion. This is just from reading the page you linked - no experience here.

Hope this helps.

Hi @feevos, thanks for the reply. I was able to get it confirmed from the MXNet devs that conversion is only needed if model was trained with fp32 and didn’t use AMP. I was also able to figure out why my memory requirements had increased after using AMP on V100 gpus. After initializing the model, I was doing a dummy forward pass on an array of nd.ones(). I believe this created all the model’s activations in fp32 form. Once I removed this, AMP reduced the memory footprint and I was able to double the batch size.

1 Like