Image resizing and scale invariance for object-detection

I followed the tutorial to finetune an object-detection network.

My images are initially 2048x2048 and are rescaled to 512x512 during training (function get_dataloader with data_shape=512) , and also for detection (using gcv.data.transforms.presets.ssd.load_test with parameter short=512 and max_size=1024).

I can use my fine-tuned network to detect my objects in new 2048x2048 squared images.

I tried then run the detection on cropped images (1466x442) but then it completely fails !
The load_test function returns an image of dimensions 1024x309 with those rectangular cropped images.
I though the data augmentation used during the training would make the trained network scale-invariant to some extent, or at least such that it stills perform well on cropped images.

Not all features are scale invariant. So if the objects that you want the model to detect appear 4 times larger in the test data than in the training data, the model will likely not be able to detect them. The best is to use the same preprocessing for training and test data: if you rescale your training images, then you should do the same for the test images.
Here is a nice article that explains the problem: https://miguel-data-sc.github.io/2017-11-23-second/

1 Like