Training on gpu(1) and gpu(2) allocates some memory on gpu(0)

I am trying to train my model on gpu(1) and gpu(2), and have ensured that all the ndarrays and symbols are on the correct GPU.
-> when i call trainer.step() while watching nvidia-smi, gpu(0)'s memory gets filled with some data
-> all future calls to trainer.step() do not have any effect on gpu(0)

Debugging this in “mxnet/gluon/”, i found that this happens in _init_kvstore(self), where the line:
kvstore.init(i, param_arrays[0]) allocates space on gpu(0), even through “param_arrays[0]” is on gpu(1)

Minimum reporducible experiment:

import mxnet as mx
ones=mx.nd.ones((1,1), ctx=mx.gpu(1))
kv.init(0, ones) ====> This line will allocate memory on gpu(0)

I can confirm this odd behaviour. Would you mind creating a github issue to report it? Looks like a bug to me. Thanks!

someone is already looking into it