Hi,
I’m using python multiprocessing to speed up the dataloader in a Gluon package. The contexts of the net, trainer and ndarray have the format @cpu_shared(0). The code runs well on my local machine. However, when I use an AWS EC2 machine with more core, the multiprocessing part succeeds, but MXNet part ends up with error about CPU shared storage.
I would like to figure out what’s wrong with the EC2 instance. The error message is attached. Thanks in advance!
File “/home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/ metric.py”, line 1289, in update
File “/home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/ ndarray/ndarray.py”, line 1998, in asscalar
File “/home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/ ndarray/ndarray.py”, line 1980, in asnumpy
File “/home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/ base.py”, line 252, in check_call
mxnet.base.MXNetError: [22:47:24] src/storage/./cpu_shared_storage_manager.h:183 : Failed to open shared memory. shm_open failed with error Too many open files