Elastic SGD algorithm

Anyone has tried to implement the EASGD Algorithm in MXNet, I tried to do this but the training does not converge to the target accuracy. I come here to get some help.

def _update_params_on_kvstore_easgd(param_arrays, grad_arrays, central_arrays, kvstore, param_names,
                                easgd_beta, easgd_iter, learning_rate, updater):
"""
:param param_arrays:
:param grad_arrays:
:param central_arrays:
:param kvstore:
:param param_names:
:param easgd_beta: the beta value set for easgd algorithm
:param easgd_iter: how often the weight is updated
:param learning_rate:
:param updater: the updater used for the worker update operation
:return:
"""
for index, pair in enumerate(zip(param_arrays, grad_arrays, central_arrays)):
    arg_list, grad_list, central_list = pair
    if grad_list[0] is None:
        continue
    name = param_names[index]
    kvstore.pull(name, central_list, priority= -index)
    for k, p in enumerate(zip(arg_list, central_list, grad_list)):
        w_local, w_central, w_grad = p
        w_central[:] = easgd_beta/(easgd_iter*kvstore.num_workers) * (w_local - w_central)
        w_local[:] = w_local - learning_rate * w_grad
    kvstore.push(name, central_list, priority= -index)

Above is the implementation code in the worker node. The central weights will be delivered to the server node to update the central_weights in the server node. The easgd_beta is set to 0.9 according to the NIPS’15 (Deep learning with Elastic Averaging SGD). Anyone with any suggestions? Thank you very much. More details can be provided if you are interested in this discussion.

The w_local[:] should be the following equation:
w_local[:] = w_local - learning_rate * w_grad - w_central