When are the images converted to range(0,1) in pre-trained resnet50_v2 on imagenet?

From the Gluon Model Zoo:

All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (N x 3 x H x W), where N is the batch size, and H and W are expected to be at least 224. The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225]

However, when I look at the train scripts for resnet50_v2 (i.e. https://github.com/apache/incubator-mxnet/blob/master/example/gluon/image_classification.py and https://github.com/apache/incubator-mxnet/blob/master/example/gluon/data.py)
There is no code line showing that the image is converted to range (0,1) (i.e. image = image/255) in the get_imagenet_iterator. Please correct me if Iā€™m wrong.

Hey @jonbakerfish
This line https://github.com/apache/incubator-mxnet/blob/master/example/gluon/data.py#L63:

        image = mx.nd.image.normalize(image, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))

Normalize the image using this mean and std. Which brings the images in the [0, 1] range.

1 Like

Hi @ThomasDelteil,
If you run this code:

im = mx.nd.random_uniform(shape=(3, 10, 10)) * 255
im_ = mx.nd.image.normalize(im, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))
print im_.min(),im_.max() 

you will get something like:

<NDArray 1 @cpu(0)> 
[ 1131.16967773]
<NDArray 1 @cpu(0)>

I think the mean and std are for images which are in range (0,1), not for the images in range (0,255). So before doing image.normalize, im=im/255 should be done somewhere else.

You are right, my bad. Indeed it seems that the only explanation is that the images are stored already in the [0, 1] range.

Hi @ThomasDelteil,
Shall I open an issue on the github to double check what are the ranges of the images?

I would first try running the script and outputting the image values that the iterator returns.

Problem solved in this issue. F.image.to_tensor will converts an image of shape (H x W x C) in the range [0, 255] to a float32 tensor of shape (C x H x W) in the range [0, 1).