NDarray splitting with overlap

Greetings,

I have another question about mxnet. I want to know how it would be possible to split an array into sub-windos.

I know that for numpy the code is :
n, c, h, w = img.shape
kernel_h,kernel_w=(3,3)
step_x,step_y=(1,1)
out_w=w-2
out_h=h-2
col = np.ndarray((n, c, out_h, out_w, kernel_h, kernel_w), dtype=img.dtype)
for R in range(kernel_h):
j_lim = R + step_y * out_h
for H in range(kernel_w):
i_lim = H + step_x * out_w
col[:,:,:,:,R,H]=img[:, :, R:j_lim:step_y, H:i_lim:step_x]
return col

But I don’t know if it can be done with the mxnet python API. With current functions the only option would be to have a nested loop that would stack slices for the second-to last axis and then stack those too with a loop .

I am unaware if there is a more optimal solution for this or if another approach would be possible.

Thanks again!

Not sure whether you’re using NDArray or symbolic API, but in MXNet version 1.0.0 you can easily do the exact same syntax with NDArrays. Example:

>>> a = mx.nd.array(np.arange(0, 300).reshape(1, 3, 10, 10))
>>> a[:,:,2:8:2, 2:8:2]
output:

[[[[ 22. 24. 26.]
[ 42. 44. 46.]
[ 62. 64. 66.]]

[[ 122. 124. 126.]
[ 142. 144. 146.]
[ 162. 164. 166.]]

[[ 222. 224. 226.]
[ 242. 244. 246.]
[ 262. 264. 266.]]]]

1 Like

Greetings and thanks for the fast reply,

I tried to use the same code ( the one I posted) with ndarray but I get the following error:
mxnet.base.MXNetError: [08:19:25] c:\projects\mxnet-distro-win\mxnet-build\src\operator\tensor./matrix_op-inl.h:895: ndim=6too large

Splitting the array is ok but the issue is reassigning the data to an empty array of the proper shape that seems to be the issue.

Even if it worked I’m unsure if it could have a negative impact in a nn layer in regard to backpropagation, this is why I wanted to know if there were official solutions to prevent some potential errors.

I’ve tried another approach using the slice function with the main data and the stack function to gather everything. Though I also get an error that shape don’t match halfway through, but when I remove the stack function and simply print the shape of the sliced array in each loop, the shape is always the same (1,3,6,6 in this case) and using a distinct axes for stacking doesn’t address this problem.

Here is the error :
[14:39:14] C:\projects\mxnet-distro-win\mxnet-build\dmlc-core\include\dmlc/logging.h:308: [14:39:14] c:\projects\mxnet-distro-win\mxnet-build\include\mxnet./tensor_blob.h:276: Check failed: this->shape_.Size() == shape.Size() (108 vs. 216) TBlob.get_with_shape: new and old shape do not match total elements
[14:39:14] C:\projects\mxnet-distro-win\mxnet-build\dmlc-core\include\dmlc/logging.h:308: [14:39:14] c:\projects\mxnet-distro-win\mxnet-build\src\engine./threaded_engine.h:359: [14:39:14] c:\projects\mxnet-distro-win\mxnet-build\include\mxnet./tensor_blob.h:276: Check failed: this->shape_.Size() == shape.Size() (108 vs. 216) TBlob.get_with_shape: new and old shape do not match total elements

Interesting. I’m going to look into your stacking issue. Just so you are aware, there is an undocumented limitation to the maximum dimension of using mx.sym.slice() (which is internally used when you slice with brackets). See here for some discussion on removing the limitation. mx.sym.slice_axis, however, doesn’t have this limitation and you can chain multiple slice_axis calls to slice a 6D array.

Perhaps if you could explain what you’re trying to achieve with your slice+stack, I can be more effective.

Of course, what I’m trying to achieve is to do what is done in a convolution or pooling, i.e.: separate an array/tensor in sub array that is the size of a window for processing. So for each value of the output I must have a corresponding subarray that is the size of the kernel/filter. In order to scale this instead of doing it sequentially I want to do it with broadcasting. So for an input tensor of shape (1,3,12,12) I would have an output tensor of shape (1,3,10,10) that is implying that my filter is 3 by 3 in that case. Thus from my (1,3,12,12) original array I would need to create an array of shape (1,3,10,10,3,3) for processing using broadcasting, this is how some other libraries handle those operations and have native support for those kind of things.

Since looping for each output cell is longer than looping over the size of the kernel and reshaping the dimension on a bigger this is why I am using a method equivalent to the numpy code I posted in my first post.

I will look into the slice_axis despite it’s deprecation to see if it can be of more help than the standard slice.

Thanks a lot for the support safrooze.

Edit: concat on the other hand seems to work properly for gathering arrays, though I am aware that using 2 levels of looping + stacking, reshaping and transposing is not optimal, this will do. But I’m still interested to know if there is an optimal way to do this using either existing functions in python or a compiled language to speed up things.

You can get around the issue of maximum dimension by simply reshaping your arrays before performing the indexing. Here is an example:

    img = mx.nd.ones((16, 3, 224, 224))
    n, c, h, w = img.shape
    kernel_h, kernel_w = (3, 3)
    step_x, step_y = (1, 1)
    out_w = w - 2
    out_h = h - 2
    col = mx.nd.empty((n, c, out_h, out_w, kernel_h, kernel_w), dtype=img.dtype)
    col = col.reshape((-1, kernel_h, kernel_w))

    for R in range(kernel_h):
        j_lim = R + step_y * out_h
        for H in range(kernel_w):
            i_lim = H + step_x * out_w
            col[:, R, H] = img[:, :, R:j_lim:step_y, H:i_lim:step_x].reshape((-1,))
    col = col.reshape((n, c, out_h, out_w, kernel_h, kernel_w))

However, you need to keep in mind that the indexing operation on NDArray is NOT copy-free. Therefore the statement where col is being assigned results in two copies: one as a result of indexing img and then the result is copied into col.

That’s an interesting option.

Here is what I have come up with:

def mxwindow(mna,window):
    mnas=mna.shape
    mnout=(*mnas[:-2],*window,(mnas[-2]-window[-2])+1),(mnas[-1]-window[-1])+1))
    mne2=None
    for R in range(window[0]):
        j_lim = R + mnout[-2]
        for H in range(window[1]):
            tdata=mnd.slice(mna, begin=(None,None,R,H), end=(None,None,j_lim,(H +  mnout[-1])), step=(None,None,1,1))
            if mne2 is None:
                mne2=tdata
            else:
                mne2=mnd.concat(mne2,tdata,dim=1)
    return(mnd.expand_dims(mnd.transpose(mnd.reshape(mne2, shape=mnout),axes=(0,4,5,3,1,2)), 3)) 

Though I wonder which version is the most efficient in terms of runtime and memory usage. I know that the last line is not that python-esque but it was the optimal option in every way.

P.S.: I am totally aware that my variable names are rubbish.

Edit: I corrected the permutation, I had it wrong at first.