Rcnn forward slow during distributed training 0.12

I am using rcnn code from examples. But when I am running the cold with kvstore with distributed training. even for two worker on the same machine . the forward operation take 10 times more to complete compare to when running train_end2end.py with kvstore set to device. wondering if anyone else ran into the same problem. Or where to start digging.



If you compile MXNet from source with USE_PROFILER=1 flag, then you can profile the code and see what’s going on. It could provide some hints

But two workers on the same machine can be slower, because each worker wouldn’t have all the resources on that machine.

i met the same issue, seems the bottleneck is updating the parameters. Maybe it’s the net issue.
do you have any progress on this issue?

No luck. We tried to increase the netwowrk bandwidth seems to help

what’s your input? is it .rec file or jpeg file?