Training YoloV3 with input dimensions 608x608 returns NaN loss

Greetings, everyone.

I am trying to train a YoloV3 on a custom dataset by referencing [1] and modifying [2]. In the model zoo, I saw that YoloV3 has 3 input dimensions: 310,416 and 608.

I tried training with input dimension 608 first, but after several epochs all losses were starting to report Nan, hence i switched to 416 and the losses were no longer reporting NaN. To note, I am using SGD with multi-precision: true

I am still looking to use 608 as my input size and was wondering if anyone would be able to offer guidance on the cause of the NaN issue

[1] https://gluon-cv.mxnet.io/build/examples_detection/train_yolo_v3.html
[2] https://gluon-cv.mxnet.io/build/examples_detection/finetune_detection.html#sphx-glr-build-examples-detection-finetune-detection-py

I seemed to have found the root cause of my issue: there was a bug present in MXnet version 1.5 that can only be resolved by installing the master/nightly package for MXnet

For the convenience of everyone that might run into this issue, refer to:
[1] https://github.com/dmlc/gluon-cv/issues/278
[2] https://github.com/apache/incubator-mxnet/pull/14209

Thanks for sharing your solution @Lee,

as a reminder for other, to install the nightly version of mxnet:

pip install mxnet-cu92mkl --pre for example