GluonCV, Faster RCNN, normalization layers

ifeherva · July 3, 2018, 8:45pm

The gluon-cv faster r-cnn model uses a special resnet50 model that “denormalizes” the input image if I understand it correctly. I assume this is added so the pretrained weights from the mxnet model zoo could be reused.

github.com

dmlc/gluon-cv/blob/master/gluoncv/model_zoo/faster_rcnn/resnet50_v2a.py#L78




class ResNet50V2(HybridBlock):
"""Resnet v2(a) for Faster-RCNN.


Please ignore this if you are looking for model for other tasks.
"""
def __init__(self, **kwargs):
    super(ResNet50V2, self).__init__(**kwargs)
    with self.name_scope():
        self.rescale = nn.HybridSequential(prefix='')
        self.rescale.add(Rescale(prefix=''))
        self.layer0 = nn.HybridSequential(prefix='')
        self.layer0.add(nn.BatchNorm(scale=False, epsilon=2e-5, use_global_stats=True))
        self.layer0.add(nn.Conv2D(64, 7, 2, 3, use_bias=False))
        self.layer0.add(nn.BatchNorm(epsilon=2e-5, use_global_stats=True))
        self.layer0.add(nn.Activation('relu'))
        self.layer0.add(nn.MaxPool2D(3, 2, 1))


        self.layer1 = self._make_layer(stage_index=1, layers=3, in_channels=64,
                                       channels=256, stride=1)
        self.layer2 = self._make_layer(stage_index=2, layers=4, in_channels=256,

However I was wondering if this step could actually be left out by simply not normalizing the image in the dataloader?

github.com

dmlc/gluon-cv/blob/07f09649c8a772f58a7d2be7da52878288f461e4/gluoncv/data/transforms/presets/rcnn.py#L139


img = timage.resize_short_within(src, self._short, self._max_size)
bbox = tbbox.resize(label, (w, h), (img.shape[1], img.shape[0]))


# random horizontal flip
h, w, _ = img.shape
img, flips = timage.random_flip(img, px=0.5)
bbox = tbbox.flip(bbox, (w, h), flip_x=flips[0])


# to tensor
img = mx.nd.image.to_tensor(img)
img = mx.nd.image.normalize(img, mean=self._mean, std=self._std)


if self._anchors is None:
    return img, bbox.astype(img.dtype)


# generate RPN target so cpu workers can help reduce the workload
# feat_h, feat_w = (img.shape[1] // self._stride, img.shape[2] // self._stride)
oshape = self._feat_sym.infer_shape(data=(1, 3, img.shape[1], img.shape[2]))[1][0]
anchor = self._anchors[:, :, :oshape[2], :oshape[3], :].reshape((-1, 4))
gt_bboxes = mx.nd.array(bbox[np.newaxis, :, :4])
cls_target, box_target, box_mask = self._target_generator(

ThomasDelteil · July 3, 2018, 9:21pm

@ifeherva It seems you are correct that it is doing the inverse transformation, with a *255 factor missing.
I am not entirely sure what is the reasonning behind the decision of proceeding that way rather than just multiplying the initial image by 255.
@Hang_Zhang @zhreshold could you advise on the reason behind this hard-coded rescale layer?

ifeherva · July 3, 2018, 9:33pm

I tried plugging in the resnet50v2 model from the mxnet model zoo which worked but the performance was much worse.

zhreshold · July 3, 2018, 9:59pm

The reason is fairly simple, due to observations in our experiments, different input scales used did affect performances quite a lot. Whether it’s due to initialization scale or pre-trained model is still unknown.

Before we figuring out a generic solution, we use these hard-coded scaling layers for consistency throughout gluon-cv package.

ifeherva · July 3, 2018, 10:04pm

Thanks for the reply. If I use the resnet50v2 model without the rescaling I get considerably worse recall on my validation set.
See code below:

base_network = mx.gluon.model_zoo.vision.get_model(base_net, pretrained=pretrained_base)
features = base_network.features[:8]
top_features = base_network.features[8:11]
train_patterns = '|'.join(['.*dense', '.*rpn', '.*stage(2|3|4)_conv'])
return model_zoo.get_faster_rcnn(base_net, features, top_features, scales=(2, 4, 8, 16, 32),
                        ratios=(0.5, 1, 2), classes=['BG', 'CLASS1'], dataset=dataset,
                        roi_mode='align', roi_size=(14, 14), stride=16,
                        rpn_channel=1024, train_patterns=train_patterns,
                        pretrained=False)

On the other hand it is at least 3 times faster at inference time. Where does this speedup come?

zhreshold · July 3, 2018, 10:25pm

less classes, therefore less non-maximum-suppression time

We use per-class NMS for best recall, so complexity of NMS is O(N) where N is number of foreground classes.

ifeherva · July 4, 2018, 5:01pm

I turned off normalization in the dataloader and used the mxnet resnet50v2 model. Got better recall (still not as good as the gluoncv resnet50). Still, my model is 2-3 times faster at inference time om gpu. The only difference I see is that

self.layer0.add(nn.BatchNorm(scale=False, epsilon=2e-5, use_global_stats=True))

vs

self.features.add(nn.BatchNorm(scale=False, center=False))

Could use_global_stats be responsible for the speed difference?

zhreshold · July 4, 2018, 6:59pm

Yes, we are investigating the bad perf of BN without CUDNN

Topic		Replies	Views
Faster_rcnn with resnet50 backbone Gluon	0	310	November 16, 2022
Object detection, finetune F-RCNN models Discussion	3	1123	October 14, 2020
Predict with pre-trained Faster RCNN models	2	648	October 23, 2019
Finetuning Fasterrcnn on custom object Gluon	1	342	February 24, 2020
The model of faster rcnn training problem, it fails Gluon	0	335	July 11, 2019

GluonCV, Faster RCNN, normalization layers

Related Topics