For some reason i want to put a custom layer right after a ResidualBlock but before relu, so I made a little tweak to the resnet18_v1
in gluon.model_zoo
, which remove relu
in the ResidualBlock(BasicBlockV1 here). The modified resnet18_v1
is a HybridSequential
which looks like:
HybridSequential(
(0): Conv2D(3 -> 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=64)
(2): Activation(relu)
(3): MaxPool2D(size=(3, 3), stride=(2, 2), padding=(1, 1), ceil_mode=False)
(4): BasicBlockV1(
(body): HybridSequential(
(0): Conv2D(64 -> 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=64)
(2): Activation(relu)
(3): Conv2D(64 -> 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=64)
)
)
(5): Activation(relu)
(6): BasicBlockV1(
(body): HybridSequential(
(0): Conv2D(64 -> 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=64)
(2): Activation(relu)
(3): Conv2D(64 -> 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=64)
)
)
(7): Activation(relu)
(8): BasicBlockV1(
(body): HybridSequential(
(0): Conv2D(64 -> 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128)
(2): Activation(relu)
(3): Conv2D(128 -> 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128)
)
(downsample): HybridSequential(
(0): Conv2D(64 -> 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128)
)
)
(9): Activation(relu)
(10): BasicBlockV1(
(body): HybridSequential(
(0): Conv2D(128 -> 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128)
(2): Activation(relu)
(3): Conv2D(128 -> 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128)
)
)
(11): Activation(relu)
(12): BasicBlockV1(
(body): HybridSequential(
(0): Conv2D(128 -> 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256)
(2): Activation(relu)
(3): Conv2D(256 -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256)
)
(downsample): HybridSequential(
(0): Conv2D(128 -> 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256)
)
)
(13): Activation(relu)
(14): BasicBlockV1(
(body): HybridSequential(
(0): Conv2D(256 -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256)
(2): Activation(relu)
(3): Conv2D(256 -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256)
)
)
(15): Activation(relu)
(16): BasicBlockV1(
(body): HybridSequential(
(0): Conv2D(256 -> 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=512)
(2): Activation(relu)
(3): Conv2D(512 -> 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=512)
)
(downsample): HybridSequential(
(0): Conv2D(256 -> 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=512)
)
)
(17): Activation(relu)
(18): BasicBlockV1(
(body): HybridSequential(
(0): Conv2D(512 -> 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=512)
(2): Activation(relu)
(3): Conv2D(512 -> 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=512)
)
)
(19): Activation(relu)
(20): GlobalAvgPool2D(size=(1, 1), stride=(1, 1), padding=(0, 0), ceil_mode=True)
)
I want to insert a custom layer after each of 12th, 14th, 16th and 18th BasicBlockV1
, and add dropout
after activation:
cnn1 = resnet18_v1().features
self.net = nn.Sequential()
last_pos = 0
for pos in (13, 15, 17, 19):
self.net.add(cnn1[last_pos:pos])
self.net.add(myLayer())
self.net.add(nn.Activation('relu'))
self.net.add(nn.Dropout(0.5))
last_pos = pos + 1 # plus 1 to skip the activation layer in cnn1
Everything looked fine but i got a warning during training:
UserWarning: Gradient of Parameter `mynet_resnet180_resnetv10_conv0_weight` on context gpu(0) has not been updated by backward since last `step`. This could mean a bug in your model that made it only use a subset of the Parameters (Blocks) for this iteration. If you are intention
Then I made a little modification to the code which unpacked the hybridSequential
:
cnn1 = resnet18_v1().features
self.net = nn.Sequential()
last_pos = 0
for pos in (13, 15, 17, 19):
self.net.add(*cnn1[last_pos:pos]) # unpacking
self.net.add(myLayer())
self.net.add(nn.Activation('relu'))
self.net.add(nn.Dropout(0.5))
last_pos = pos + 1 # plus 1 to skip the activation layer in cnn1
At this time, no warning showed during training.
But this is still confusing me because the behavior of the networks with/without unpacking should be the same.
Did I miss anything or it is just a bug of gluon?