Trying to modify SSD lesson to work with Pascal dataset. Got low training loss but terrible prediction result

Complete notebook hosted on kaggle:

I am trying to modify SSD notebook to work with Pascal VOC dataset.
Basically, I keep all the notebook code. The only modification I made is to define custom data loader that creates the train_iterator.

I am certain that my custom data loader works fine because the bounding box and label are all correct as demonstrated by the screenshot below.

I proceed to train my model. The training loss curve looks good and the loss after 20 epochs is not much worse than the pikachu dataset

However, when I run prediction on random test image. The result is terrible. Class prediction is entirely random and BBOX is not nearly as tight as the pikachu result


  1. Why is the loss so low (close to the Pikachu loss ) and the prediction result so bad ( class prediction seems random ) ? Is the toy model just underfitting?

  2. If there is a bug in my modification, I can’t really see where I made it. After all I copied the entire SSD notebook without zero change except replacing Pikachu dataset with Pascal dataset.

All inputs are appreciated. Thanks

My training loss:

Prediction result:

Prediction result2:


Should you not normalize - mean center and standardize - the image before you provide it to the network? Or is that preprocessing done in the first layer of your network? If you export the network with gluon-cv you have the option to add the normalization layers, but I don’t know if that’s done by default.

What can also account for a 10% accuracy drop is reading the image with OpenCV in BGR format when your network expects RGB.


Hello, my dataloader did not us OpenCV. The dataloader generates a .lst file and use mxnet.image.ImageDetIter to read the lst file

As for normalization, I added

  def forward(self, X):
        X = X / 255.0

for a quick and dirty scaling.

However, this change did not mitigate the symptom ( still low loss and terrible performance. The loss after 20 epochs did not change much as well)

The second modification I did is that I change the base_net from a simpler toy network from the SSD notebook

def base_net():
     blk = nn.Sequential()
     for num_filters in [16, 32, 64]:
     return blk

to Resnet18. The result is that the loss actually got worse and the prediction is equally terrible

All the described changes are commited to kaggle notebook:

I don’t see the root cause of your issue straightaway, some comments that might help you in debugging.

During training apparently a network gets built that successfully predicts bounding boxes and class probabilities. Assuming that for prediction you are actually using the trained network, I’d guess that the way you provide an image during training and during prediction respectively are not the same.

Can you check that? Ideally with an image that has been used during training so you are sure that the network will predict with high accuracy. Look at and compare the actual shape of the images (size, channels), pixel values (are they float32 values more or less in the range -2.5 - 2.5?), color space.

Three notes about the normalization that I suggested in my previous reply:

  1. You implement a division by 255, but that’s not the normalization I mean. That you do to change the pixel values from 0-255 to 0.0-1.0, and is probably already done by the imread function.

What I meant is mean centering and standardization:
Mean-centering = subtracting the mean of all pixel values - either per image or what’s typically done for SSD with values based on the Imagenet dataset (*).
Standardization = divide by the standard deviation

  1. You make the change to the forward function, but that means it impacts both the training and inference in the same way.
    As I said before, I think you need to look at what’s different in these two code paths, the training seems to be alright so I’d not modify it.

  2. The normalization is normally done by the loader, at least for training. For prediction where you don’t use a loader you need to include this step yourself.

(*) the normalization code I’m using:

img_arr = load_image(image_path, expected_size, ctx)
img_arr /= 255.

mean = mx.nd.array([0.485, 0.456, 0.406], ctx=ctx)
img_arr -= mean

std = mx.nd.array([0.229, 0.224, 0.225], ctx=ctx)
img_arr /= std


Some great suggestions from @lgo here. You might also find it useful to plot a few of your training predictions too, and check that a ‘low loss’ aligns with your idea of a good prediction. And if this isn’t the case, a good debugging check is to confirm the model can overfit on a small subset of the data.