Which conv layer should be selected as the last conv layer in gradcam example when using resnet50v2?

I followed the “Visualizing Decisions of Convolutional Neural Networks”, and it gives the correct output images when using vgg16 pretrained model. ThenI changed the network to ResNet50v2 with its pretrained model, but the output images looks abnormal, Some code snippet are as follow.

    # ResNetV2 using gradcam's Conv2D and Activation
    net = ResNetV2(BottleneckV2, layers, channels, **kwargs)
    net.initialize(ctx=ctx)

    resnet50v2 = mx.gluon.model_zoo.vision.resnet50_v2()
    # load pretrain model
    resnet50v2.load_parameters('D:/Model/mxnet/models/resnet50_v2-ecdde353.params', ctx=ctx)
    params = resnet50v2.collect_params()
    for key in params:
        param = params[key]
        net.collect_params()[net.prefix + key.replace(resnet50v2.prefix, '')].set_data(param.data())

    # ...
    last_conv_layer_name = net.features[8][2].conv3.name
    show_images(*visualize(network, "hummingbird.jpg", last_conv_layer_name))

Figure_1
The upper row uses resnet pretrained model, and the lower row uses vgg16 model

Update on 23th, Jan
I try to output the imggrad without recording gradient of conv layer, because the detail is recovered only using img grad, in this situation, only Activation layer(Relu) is rewritten. The result is correct when using vgg16, , but if using resnet, the imggrad is also unclear.

Hi @7oud,

A ResNet is a collection of residual blocks, and a residual block just computes the ‘change’ (i.e the residual) to the overall feature map. So I think you’re just showing the pixels that contribute to the largest change in that specific residual block which isn’t really that meaningful or useful. You need to be working with the overall feature map.

You could try to apply GradCam just before the GlobalAvgPool2D.

Also, for networks where the output of the GlobalAvgPool2D corresponds to class logits (i.e. before softmax), you can just visualise the feature maps that the input to GlobalAvgPool2D directly, which avoids the need for GradCam. Just select the channel from the feature map that corresponds to the class you’re interested in. Although in this resnet50_v2 network, it looks like there’s a dense layer at the end which breaks the correspondence between channel and class.

@thomelane Thanks for explanation!
There is a problem when using GradCam before GlobalAvgPool, it can not input a non-conv layer name in gradcam.visualize() function, because a Conv2d is rewritten. May I use the resnet model without modifying gradcam.py?
As you said, the feature map before GAP can be used for visualizing the heatmap, but if I want to visualize the Saliency map(4-th picture), the gradcam should be needed.

I think you’ll have to modify the gradcam.py script and overwrite the GlobalAvgPool2D operator, but it shouldn’t be too tricky if you copy the form of the Conv2D. @indu can you confirm this?

@thomelane The output of GlobalAvgPool2D is 1x1 shape, its gradient and output is too small to recover the image activation. Maybe the last Add layer should be used, am I right? However, how to rewritten the add op ?

Hi @7oud,

Were you able to fix this problem?