My dataloader for my image-based dataset with num_workers > 0 often crashes due to python mulitprocessing. In fact I get the following error:
IOError: [Errno 104] Connection reset by peer
With num_workers = 0 (default) I have no issues other than training is very slow. Is this issue related to opencv threading? I am using python 2.7
What’s the version of MXNet you’re using?
I am using 1.3 (master branch)
Are you using mxnet.image
library or OpenCV directly? If using OpenCV directly, does the problem go away if you only mxnet.image
calls?
I am using the following call:
data = mx.image.imread(image_path, flag=1)
Will reduce the num_worker resolve the issue?
Even with num_worker it hangs at receiving the data. My dataset is return a tuple of 4 NDArrays, maybe pickling is slow?
NDArray pickling uses shared memory when num_workers > 0
so that pickling wouldn’t copy over the memory for performance. I have, however, heard of a few users claiming that using Numpy to transfer data between processes is faster than using NDArrays with shared memory pickling. I always assumed that they’re doing something wrong because Numpy doesn’t supports shared memory AFAIK, but maybe there is something I’m missing.
Hi guys,
I have a related issue when I try to put a value higher than 0 for num_workers
.
mxnet.base.MXNetError:E:\pyjq\tp\opencv\opencv\modules\imgcodecs\src\loadsave.cpp:721: error: (-2) unable to remove temporary file in function cv::imdecode_
I am running the example for finetuning an object-detection network.
I am using MXNET 1.5.0 with CUDA 9.2, OpenCV 3.4.2 and Python 3.7.4 on Windows.
Hi @LauLauThom,
There’s limited support for multiple workers in windows due to the forking system. Maybe try using thread_pool=True
in your DataLoader
1 Like