Nd.array() not scalable, fails on large array size

bckenstler · November 16, 2017, 10:48pm

I encountered the following issue after loading a very large numpy array into memory on the new P3 EC2 instance, and attempting to convert to an ndarray.

Using np.zeros to reproduce the issue:

nd.array(np.zeros((368884, 512, 13, 13)))
---------------------------------------------------------------------------
MXNetError                                Traceback (most recent call last)
<ipython-input-41-f807bb25697e> in <module>()
----> 1 nd.array(np.zeros((368884, 512, 13, 13)))

/usr/local/lib/python2.7/dist-packages/mxnet/ndarray/utils.pyc in array(source_array, ctx, dtype)
    143         return _sparse_array(source_array, ctx=ctx, dtype=dtype)
    144     else:
--> 145         return _array(source_array, ctx=ctx, dtype=dtype)
    146 
    147 

/usr/local/lib/python2.7/dist-packages/mxnet/ndarray/ndarray.pyc in array(source_array, ctx, dtype)
   1889                 raise TypeError('source_array must be array like object')
   1890     arr = empty(source_array.shape, ctx, dtype)
-> 1891     arr[:] = source_array
   1892     return arr
   1893 

/usr/local/lib/python2.7/dist-packages/mxnet/ndarray/ndarray.pyc in __setitem__(self, key, value)
    407                 _internal._set_value(float(value), out=self)
    408             elif isinstance(value, (np.ndarray, np.generic)):
--> 409                 self._sync_copyfrom(value)
    410             else:
    411                 raise TypeError(

/usr/local/lib/python2.7/dist-packages/mxnet/ndarray/ndarray.pyc in _sync_copyfrom(self, source_array)
    609             self.handle,
    610             source_array.ctypes.data_as(ctypes.c_void_p),
--> 611             ctypes.c_size_t(source_array.size)))
    612 
    613     def _slice(self, start, stop):

/usr/local/lib/python2.7/dist-packages/mxnet/base.pyc in check_call(ret)
    144     """
    145     if ret != 0:
--> 146         raise MXNetError(py_str(_LIB.MXGetLastError()))
    147 
    148 if sys.version_info[0] < 3:

MXNetError: [22:40:49] include/mxnet/././tensor_blob.h:275: Check failed: this->shape_.Size() == shape.Size() (31918794752 vs. 1854023680) TBlob.get_with_shape: new and old shape do not match total elements

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x272f3c) [0x7f51bacaff3c]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x2a4418) [0x7f51bace1418]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x3e08c0) [0x7f51bae1d8c0]
[bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x22bc2dc) [0x7f51bccf92dc]
[bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x227e4a1) [0x7f51bccbb4a1]
[bt] (5) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(MXNDArraySyncCopyFromCPU+0xa) [0x7f51bcaab04a]
[bt] (6) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7f5295dafe40]
[bt] (7) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x2eb) [0x7f5295daf8ab]
[bt] (8) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(_ctypes_callproc+0x48f) [0x7f5295fbf3df]
[bt] (9) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(+0x11d82) [0x7f5295fc3d82]

Am I missing something here? Why is there an upper limit on the allowed nd.array size? The array fits into memory already, it is not a resource problem.

Any help would be appreciated, I am using an iterator wrapper for smaller NDArrayIter instances as a workaround atm

Jerry · November 17, 2017, 3:29am

Are u sure this array fits inside memory? 36888451213*13 * 4bytes ~= 127 GB. or 256GB if dtype is float64. You will need p3.16xlarge for this array.

bckenstler · November 17, 2017, 3:54am

Yes. I am using a p3.16xlarge. This problem has been mentioned before.

I realize that this is an unusual use case, but if the tagline is MXNet: A Scalable Deep Learning Framework, then we should aim to have the most basic iterator be scalable right?

Jerry · November 17, 2017, 4:20am

Ah, I see, the size is probably stored using 32bit signed int, so your tensor has exceeded that value. Are you able to disclose what kind of operation you are trying to do on the? One hacky solution without modifying the framework would be splitting the tensor in multiple part, and work on the individually, and later reduce them.

bckenstler · November 17, 2017, 4:22am

I am feeding the tensor into a neural network. I’ve been using a iterator wrapper to manage smaller instances of NDArrayIter for the time being.

Jerry · November 17, 2017, 4:35am

@piiswrong is probably better suited than me to answer this question. How hard is it to represent size in int64?

piiswrong · November 18, 2017, 12:21am

We currently don’t support it. We are planning to upgrade to 64bit but the work hasn’t started yet.

Topic		Replies	Views
NDarray fails silently on large array size Discussion	3	537	January 10, 2019
NDArray conversion from other array types Discussion	3	3672	November 14, 2017
Mxnet vs numpy incredible slow Performance	1	1076	August 8, 2019
Mxnet ndarray to numpy without copy Discussion	1	477	September 11, 2019
Make NDArray JSON serializable? Performance	1	17889	August 15, 2018

Nd.array() not scalable, fails on large array size

Related Topics