Data loader with rectangular images for object-detections

Hi guys,
I read through the object-detection tutorial to finetune an existing network for object-detection.

https://gluon-cv.mxnet.io/build/examples_detection/finetune_detection.html

and I also checked this script

So I understood that for SSD the images are resized such that the shorter side is 512 or 300 depending on the network.
In the example with the Pikachu dataset the images are squared (datashape = 512), and so the data- loader is configured with square dimensions
width, height = datashape, datashape
and
SSDDefaultTrainTransform(width, height, anchors)

I was wondering if I have rectangular and not squared images should I modify those lines to match this case ?
Like if I know that my images will be resized such that the shorter side is going to be 512, then I could apply the same ratio to calculate the expected length of the longer side and use that for width, height above.

Dear @zhreshold @hetong007,
I noticed you were the last ones to contribute to train_ssd.py, would you be able to shed some light here ?
Thanks a lot !

For SSD, it’s ideally taking a batch of same input resolutions, so if you decide to not use square images, you can set a fixed shape, e.g., (384, 512) in the transform functions, and that will be all.

1 Like