Potential Bug using nd.tile after Convolutional Layers

amishkin · August 21, 2019, 8:38am

I’ve encountered an issue using mxnet.ndarray.tile to upscale CNN embeddings so that they may be used as features in another convolutional map.

The issue is reproduced in the code below. First, embeddings are computed for a context set using a CNN; these embeddings are tiled to match their original shape and then appended as channels to a query set. Finally, a CNN is used to make predictions from the query set.

## Setup ##

import numpy as np
import mxnet as mx
import mxnet.gluon as gluon
import mxnet.gluon.nn as nn

embedding_block = nn.Sequential()
# make a small CNN to embedd the "context"
embedding_block.add(nn.Conv2D(channels=6, kernel_size=5, strides=1, activation='relu', padding=(2,2)))
embedding_block.add(nn.AvgPool2D(pool_size=(2,2), strides=2))

# make a CNN classifier for the query set
query_block = nn.Sequential()
query_block.add(nn.Conv2D(channels=6, kernel_size=5, strides=1, activation='relu'))
query_block.add(nn.AvgPool2D(pool_size=(2,2), strides=2))
query_block.add(nn.Dense(units=1))

embedding_block.collect_params().initialize()
query_block.collect_params().initialize()

## Data Generation ##

# create a simple squared loss problem
w = np.random.normal(size=(28,28))


# features should be multi-channel images
features = np.random.normal(size=(200, 6, 28, 28))
temp = np.sum(w * features, axis=(1,2,3))
targets = np.sign(np.add(temp[:100], temp[100:]))
context_features = mx.nd.array(features[:100])
query_features = mx.nd.array(features[100:])
targets = mx.nd.array(targets)

## Data Generation ##

loss = 0.
with mx.autograd.record():
    # Add features via nd.tile
    context_embedding = mx.nd.sum(embedding_block(context_features), axis=0)
    channel = context_embedding.tile((100, 1, 2, 2))
    # append new channel to image features
    task_features = mx.nd.concat(query_features, channel)

    preds = query_block(task_features)
    loss = loss + mx.nd.sum(mx.nd.square(mx.nd.squeeze(preds) - targets))
    
loss.backward()

loss.asscalar()

The following error is thrown when loss.asscalar() is called.

src/operator/nn/../tensor/broadcast_reduce_op.h:408: 
Too many reduction axes from [100,1,1,6,2,14,2,14] to [1,1,1,6,1,14,1,14]

As far as I know, this error is only thrown when context_embedding is computed using convolutional layers. I initially tried nd.tile on a pre-defined nd.array and was not able to replicate the issue. The error is also not thrown if the context embeddings are not tiled (e.g. when they are already the same size as the query image channels).

Can anyone shed some light on this issue?

NRauschmayr · August 21, 2019, 6:36pm

The problem is the number of dimensions of your tensor (see code below where the error is thrown). Special operators like broadcast only support a maximum number of dimensions which is currently 5 (MXNET_SPECIAL_MAX_NDIM).

 if (j <= MXNET_SPECIAL_MAX_NDIM) {
    const int ndim = (j <= 2? 2 : MXNET_SPECIAL_MAX_NDIM);
    new_small->assign(new_small->begin(), new_small->begin() + ndim);
    new_big->assign(new_big->begin(), new_big->begin() + ndim);
  } else {
    LOG(FATAL) << "Too many reduction axes from " << big << " to " << small;
  }

amishkin · August 22, 2019, 11:05am

None of the tensors that I manually create in the example above have a dimension greater than five. I suppose the implementations of the backward operators produce tensors with dimension six, which causes the issue?

How can I avoid reaching this ceiling?

NRauschmayr · August 22, 2019, 9:01pm

I tried running your code and you are right: after mx.nd.tile the shape is (100, 6, 28, 28) but mx.nd.concat is throwing an error. I will open a Github issue.

For the time being you could trying using mx.nd.repeat instead of tile:

    channel = mx.nd.repeat(context_embedding, repeats=2, axis=1) 
    channel = mx.nd.repeat(channel, repeats=2, axis=2)
    channel = channel.expand_dims(axis=0)
    channel = mx.nd.repeat(channel, repeats=100, axis=0)

amishkin · August 23, 2019, 8:16am

Unfortunately, repeat expands the matrix dimensions by repeating entires adjacent to each-other, which is not the behavior that I am looking for.

I’ve opened an issue on the Github page here. It would be excellent if we could followup on this discussion there.

Topic		Replies	Views
mxnet.base.MXNetError: Error in operator conv0_fwd: Shape inconsistent, Provided = [64,64,3,3], inferred shape=(64,3,3,3) Gluon	2	2151	June 22, 2019
NLP prediction using a CNN pretrained model Discussion	3	829	January 26, 2018
MxNet ONNX loder does not support Embedding layer?	1	1370	August 10, 2018
Running inference with varying input size	3	1141	October 20, 2019
How to make two different sizes of feature map have the same size with bigger size?	4	1040	March 23, 2018

Potential Bug using nd.tile after Convolutional Layers

Related Topics