My model training pipeline is as follows:
- MyNet has two branches: object detection branch (faster rcnn) and image retrieval branch.
- Image input to MyNet, go through the backbone network, and go to the detection branch first to detect the objects, output several bbox.
- Using these bbox to crop the regions from the last feature map of the backbone network, so we get these ROI features.
- ROI features input into retrieval branch, output an 128 dims embedding vector.
- Optimize the triplet loss.
But the problem is:
- faster rcnn batch size can only be 1, so I have to input the images one by one.
- but the triplet loss needs a batch (more than 3 images – anchor, positive, negative) of images.
So my solution is:
- In the detection branch, do the forward pass multiple times to get a batch embedding tensor.
- The retrieval branch takes the embedding tensor as input, calculate the triplet loss.
- Do the backward pass.
How to achieve this pipeline? As far as I know, we can do the parameters update - trainer.step() after several forward + backward, like this
for p in net.collect_params():
p.grad_req = 'add'
for i in range(100):
net.collect_params().zero_grads()
for j in range(iter_size):
y = net(data)
y.backward()
trainer.step()
But what I want to do is something like:
for i in range(num_epochs):
emb_batch = []
for j in range(batch_size):
emb = net(data)
emb_batch.append(emb)
loss = triplet_loss(emb_batch)
loss.backward
trainer.step(batch_size)
net.collect_params().zero_grads()
error raise:
UserWarning: Gradient of Parameter `tripletrcnn0_tnet_weight` on context gpu(0) has not been
updated by backward since last `step`. This could mean a bug in your model that made it only use a
subset of the Parameters (Blocks) for this iteration. If you are intentionally only using a subset, call step
with ignore_stale_grad=True to suppress this warning and skip updating of Parameters with stale
gradient.
Any suggestions?