Aggregate gradients manually over n batches

I really appreciate for the suggestion and the help. It’s so good to have people like you in the community. :grinning:

1 Like

Hi, bro, would you like to take a look at this issue: About stale gradient ?

Thank you bro

Hi! I think it is a bit more complicated than this in practice, right?
for example, when I do what you propose on gluoncv SSD I get this: “UserWarning: Gradient of Parameter ssd0_expand_trans_bn0_moving_mean on context gpu(0) has not been updated by backward since last step. This could mean a bug in your model that made it only use a subset of the Parameters (Blocks) for this iteration. If you are intentionally only using a subset, call step with ignore_stale_grad=True to suppress this warning and skip updating of Parameters with stale gradient”

how to handle this?

this fix seems to work! About stale gradient

for p in net.collect_params().values():
    if p.grad_req != 'null':
        p.grad_req = 'add'

actually not training anything How to make gradient accumulation work in MXNet? if someone can help that will be appreciated!

Hi this error should not have to do with setting grad_req to ‘add’ value, I’ve encountered many times when I wasn’t calculating the loss properly. Please try the same code without the grad_req='add' trick to see if you get the same error.

How about Symbolic API, @safrooze ?