This is very straightforward to do with Gluon. You need to set the grad_req
in your network Parameter
instances to 'add'
and manually set the gradient to zero using zero_grad()
after each Trainer.step()
(see here). To set grad_req to 'add'
:
for p in net.collect_params().values():
p.grad_req = 'add'
And similarly call zero_grad()
on each parameter after calling Trainer.step(). Remember to modify batch_size
argument of trainer.step()
accordingly.