Nd.array() not scalable, fails on large array size

I encountered the following issue after loading a very large numpy array into memory on the new P3 EC2 instance, and attempting to convert to an ndarray.

Using np.zeros to reproduce the issue:

nd.array(np.zeros((368884, 512, 13, 13)))
---------------------------------------------------------------------------
MXNetError                                Traceback (most recent call last)
<ipython-input-41-f807bb25697e> in <module>()
----> 1 nd.array(np.zeros((368884, 512, 13, 13)))

/usr/local/lib/python2.7/dist-packages/mxnet/ndarray/utils.pyc in array(source_array, ctx, dtype)
    143         return _sparse_array(source_array, ctx=ctx, dtype=dtype)
    144     else:
--> 145         return _array(source_array, ctx=ctx, dtype=dtype)
    146 
    147 

/usr/local/lib/python2.7/dist-packages/mxnet/ndarray/ndarray.pyc in array(source_array, ctx, dtype)
   1889                 raise TypeError('source_array must be array like object')
   1890     arr = empty(source_array.shape, ctx, dtype)
-> 1891     arr[:] = source_array
   1892     return arr
   1893 

/usr/local/lib/python2.7/dist-packages/mxnet/ndarray/ndarray.pyc in __setitem__(self, key, value)
    407                 _internal._set_value(float(value), out=self)
    408             elif isinstance(value, (np.ndarray, np.generic)):
--> 409                 self._sync_copyfrom(value)
    410             else:
    411                 raise TypeError(

/usr/local/lib/python2.7/dist-packages/mxnet/ndarray/ndarray.pyc in _sync_copyfrom(self, source_array)
    609             self.handle,
    610             source_array.ctypes.data_as(ctypes.c_void_p),
--> 611             ctypes.c_size_t(source_array.size)))
    612 
    613     def _slice(self, start, stop):

/usr/local/lib/python2.7/dist-packages/mxnet/base.pyc in check_call(ret)
    144     """
    145     if ret != 0:
--> 146         raise MXNetError(py_str(_LIB.MXGetLastError()))
    147 
    148 if sys.version_info[0] < 3:

MXNetError: [22:40:49] include/mxnet/././tensor_blob.h:275: Check failed: this->shape_.Size() == shape.Size() (31918794752 vs. 1854023680) TBlob.get_with_shape: new and old shape do not match total elements

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x272f3c) [0x7f51bacaff3c]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x2a4418) [0x7f51bace1418]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x3e08c0) [0x7f51bae1d8c0]
[bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x22bc2dc) [0x7f51bccf92dc]
[bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x227e4a1) [0x7f51bccbb4a1]
[bt] (5) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(MXNDArraySyncCopyFromCPU+0xa) [0x7f51bcaab04a]
[bt] (6) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7f5295dafe40]
[bt] (7) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x2eb) [0x7f5295daf8ab]
[bt] (8) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(_ctypes_callproc+0x48f) [0x7f5295fbf3df]
[bt] (9) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(+0x11d82) [0x7f5295fc3d82]

Am I missing something here? Why is there an upper limit on the allowed nd.array size? The array fits into memory already, it is not a resource problem.

Any help would be appreciated, I am using an iterator wrapper for smaller NDArrayIter instances as a workaround atm

Are u sure this array fits inside memory? 36888451213*13 * 4bytes ~= 127 GB. or 256GB if dtype is float64. You will need p3.16xlarge for this array.

Yes. I am using a p3.16xlarge. This problem has been mentioned before.

I realize that this is an unusual use case, but if the tagline is MXNet: A Scalable Deep Learning Framework, then we should aim to have the most basic iterator be scalable right?

Ah, I see, the size is probably stored using 32bit signed int, so your tensor has exceeded that value. Are you able to disclose what kind of operation you are trying to do on the? One hacky solution without modifying the framework would be splitting the tensor in multiple part, and work on the individually, and later reduce them.

I am feeding the tensor into a neural network. I’ve been using a iterator wrapper to manage smaller instances of NDArrayIter for the time being.

@piiswrong is probably better suited than me to answer this question. How hard is it to represent size in int64?

We currently don’t support it. We are planning to upgrade to 64bit but the work hasn’t started yet.