Aggregate gradients manually over n batches

Ghostish · November 21, 2018, 3:17am

I really appreciate for the suggestion and the help. It’s so good to have people like you in the community.

BluebirdStory · April 29, 2019, 1:31am

Hi, bro, would you like to take a look at this issue: About stale gradient ?

Thank you bro

olivcruche · October 20, 2019, 1:01pm

Hi! I think it is a bit more complicated than this in practice, right?
for example, when I do what you propose on gluoncv SSD I get this: “UserWarning: Gradient of Parameter ssd0_expand_trans_bn0_moving_mean on context gpu(0) has not been updated by backward since last step. This could mean a bug in your model that made it only use a subset of the Parameters (Blocks) for this iteration. If you are intentionally only using a subset, call step with ignore_stale_grad=True to suppress this warning and skip updating of Parameters with stale gradient”

how to handle this?

olivcruche · October 20, 2019, 1:11pm

this fix seems to work! About stale gradient

for p in net.collect_params().values():
    if p.grad_req != 'null':
        p.grad_req = 'add'

olivcruche · October 20, 2019, 1:38pm

actually not training anything How to make gradient accumulation work in MXNet? if someone can help that will be appreciated!

feevos · October 21, 2019, 11:17pm

Hi this error should not have to do with setting grad_req to ‘add’ value, I’ve encountered many times when I wasn’t calculating the loss properly. Please try the same code without the grad_req='add' trick to see if you get the same error.

tranvanhoa533 · July 2, 2020, 8:57am

How about Symbolic API, @safrooze ?

Topic		Replies	Views
About stale gradient Gluon	17	3199	October 19, 2020
Gradient fetching Discussion	2	586	May 31, 2018
Implementation of weighted softmax by extending mx.autograd.Function fails	2	649	September 2, 2019
How to implement the addtion of grad in the backback-propagating,how to add extra term (which is the gradient to middle net layer output) to the network	2	587	August 18, 2018
WGAN-gp: can't compute gradient penalty with gluon? Gluon	0	407	October 15, 2020

Aggregate gradients manually over n batches

Related Topics