Aggregate gradients manually over n batches

safrooze · January 16, 2018, 5:27pm

This is very straightforward to do with Gluon. You need to set the grad_req in your network Parameter instances to 'add' and manually set the gradient to zero using zero_grad() after each Trainer.step() (see here). To set grad_req to 'add':

for p in net.collect_params().values():
    p.grad_req = 'add'

And similarly call zero_grad() on each parameter after calling Trainer.step(). Remember to modify batch_size argument of trainer.step() accordingly.

Topic		Replies	Views
About stale gradient Gluon	17	3208	October 19, 2020
Gradient fetching Discussion	2	586	May 31, 2018
Implementation of weighted softmax by extending mx.autograd.Function fails	2	651	September 2, 2019
How to implement the addtion of grad in the backback-propagating,how to add extra term (which is the gradient to middle net layer output) to the network	2	591	August 18, 2018
WGAN-gp: can't compute gradient penalty with gluon? Gluon	0	410	October 15, 2020

Aggregate gradients manually over n batches

Related Topics