Aggregate gradients manually over n batches

this fix seems to work! About stale gradient

for p in net.collect_params().values():
    if p.grad_req != 'null':
        p.grad_req = 'add'