How MxNet average the workload among different workers while distributed training?

Hi, I am doing some experiments and trying to figure out how MxNet average workload among different workers while distributed training.

It seems to happen in the DMLC core. However, I can not figure out where is it…

Can someone point it out to me?
