Mixed Precision for Object Detection

Hi, I look at the training script for GluonCV’s object classification task and I found that most of the training use fp16. However, I think mixed precision is not used in the training of object detection task. Why don’t you use mixed precision in this case? Will the performance(accuracy) degrade with mixed precision? Besides, will multi-GPU training generally provide better accuracy for object detection?(I am thinking that the BN layer may benefit if we have multi-GPU)

Could you point to what object classification and object detection training scripts you’re looking at?

In general, for layers that have large reductions, it’s better to perform those with fp32 because the accuracy might degrade, as you point out, due to the lack of representability for large numbers with fp16. There’s a good discussion here on what use cases affect mixed precision in mxnet: https://mxnet.incubator.apache.org/faq/float16.html

multi-GPU provides faster convergence because you can evaluate more batches at a go so you can get to a better accuracy in the same amount of time compared to single. I’m not sure of any effects on BatchNorm layer for multi-GPU

I basically take a look at classification . The ResNet 101D, 152D are all trained using FP16 and they get pretty good result. But I look on detection and I think all of them use FP32 for training. So I wonder if it is the case that FP16 will degrade the performance for object detection.

Thanks for your reference to the article. I have read about it before. Besides, according to this paperMixed Precision Training. The FP16 training works fine for object detection though.