Should kv_store be initialized at the beginning of each iteration?

When update_on_kvstore = None, kv_store simply sums the gradients and passes them to the worker. But where does kv_store initialize the value in key_value to 0?
If kv_store is not initialized to 0 every iteration, then the sum of the gradients is wrong.
Where and how to initialize if needed?

The problem has been solved, the default update operation of kv_store is Assign.

I found that the gradient was automatically summed in the push process, and then the sum gradient was assigned to the corresponding key_value in kv_store.

Can I change the merge process of push in kv_store?

Yes you can change the merge process, by calling kv._set_updater():

def update(key, input, stored):
    print("update on key: %d" % key)
    stored += input * 2
kv.pull(3, out=a)

You can also find more information in this tutorial: