Exception calling ndarray.asscalar() on the GPU

I’m trying to generate predictions on the GPU, but am running into an issue when transferring data back to the CPU. The code below works when run on the CPU.

        for (outputs, nbatch, batch) in mod.predict(pred_iter):
            predicted = outputs[0]
            topk = predicted.topk(k=10, ret_typ='both')
            for idx in range(predicted.shape[0]):
                predictions = [(topk[1][idx, j].asscalar(), topk[0][idx, j].asscalar()) for j in range(num_predictions)]
                yield predictions

But if I switch the context to GPU, there’s an exception in the asscalar() call above:
“Source and target must have the same data type when copying across devices. (0 vs 1)”.

Individually, the following snippet also works:

arr = nd.array([5, 6, 7, 8], dtype=np.float32, ctx=mx.gpu(0))
topk = arr.topk(k=2, ret_typ='both')
print(topk[0][0].asscalar(), topk[1][0].asscalar())

I don’t see a difference between the two. Any thoughts on figuring this out?

The problem is that topk[0][0].asscalar() will copy your data from GPU to CPU. As written in the documentation, this function call is equivalent to calling topk[0][0].asnumpy()[0] http://mxnet.incubator.apache.org/test/versions/0.10/api/python/ndarray.html#mxnet.ndarray.NDArray.asscalar

But, that’s fine. It’s what I want to do, so I can write out the predictions to disk.
It works in the smaller snippet, so I’m not sure it is just the fact that the data is initially on GPU. There must be a way to get it to CPU.
ndarray.asscalar()/ndarray.asnumpy() internally copy the data over, but in this case the issue seems to be in the data type (dtype) being different.

Thanks for the clarification. Then I misunderstood your question, because I thought you did not want to copy the data to CPU. Could you send a small example, so that I can reproduce your problem?

The fact that you receive an error when you call asscalar()doesn’t really mean that the problem is in doing this particular operation. Operations are executed in asynchronous, non-blocking manner, and asscalar() method acts like a synchronization point - so if the problem happened somewhere before, it would be visible only at the time of calling asscalar().

It seems that data types of NDArray on GPU and on CPU are different, but it is hard to say without a full example what is going on.

I’m pretty sure I’ve figured it out now. I’m working with scipy.sparse.csr_matrix inputs that are converted to nd.sparse.csr_matrix for training and prediction. I was explicitly setting the dtype to np.float32 while setting up the training data, but was not doing it for the prediction data. It’s not obvious to me why prediction would work, but asscalar() would fail. Perhaps it is the asynchronicity that @Sergey mentioned above.

Here’s a sample that runs into this issue (here the nd.sparse.csr_matrix is generated from a dense np.array, though):

import numpy as np

import mxnet as mx
from mxnet import ndarray as nd

# Set up some data
x = np.array([
    [0, 0, 0],
    [0, 0, 1],
    [0, 1, 0],
    [0, 1, 1],
    [1, 0, 0],
    [1, 0, 1],
    [1, 1, 0],
    [1, 1, 1]
y = np.array([
    [1, 0],
    [0, 1],
    [1, 0],
    [0, 1],
    [1, 0],
    [0, 1],
    [1, 0],
    [0, 1]

# Training data has the dtype set explicitly
train_x = nd.sparse.csr_matrix(x, dtype=np.float32, ctx=mx.cpu())
train_y = nd.sparse.csr_matrix(y, dtype=np.float32, ctx=mx.cpu())

# Set up the model
norm_init = mx.initializer.Normal(sigma=0.1)
bias_init = mx.initializer.Zero()

X = mx.sym.Variable('X', stype='csr')
Y = mx.sym.Variable('Y', stype='csr')

W1 = mx.symbol.Variable('W1', stype='row_sparse', shape=(3, 5), init=norm_init)
b1 = mx.symbol.Variable('b1', shape=5, init=bias_init)
f1 = mx.sym.broadcast_add(mx.sym.sparse.dot(X, W1), b1)
r1 = mx.sym.relu(f1)
d1 = mx.sym.Dropout(r1, p=0.5)

W2 = mx.symbol.Variable('W2', shape=(5, 2), init=norm_init)
b2 = mx.symbol.Variable('b2', shape=2, init=bias_init)
f2 = mx.sym.broadcast_add(mx.sym.sparse.dot(d1, W2), b2)

output = mx.sym.LogisticRegressionOutput(f2, label=Y)

# Create the module and train
train_iter = mx.io.NDArrayIter(train_x, train_y, batch_size=2, last_batch_handle='discard', data_name='X', label_name='Y')

mod = mx.mod.Module(symbol=output,
mod.bind(data_shapes=train_iter.provide_data, label_shapes=train_iter.provide_label)
mod.init_optimizer(optimizer='sgd', optimizer_params={'learning_rate':0.5})


# Set up the prediction data, without specifying the dtype. If specified, the rest works.
test_x = nd.sparse.csr_matrix(x, ctx=mx.cpu())
test_iter = mx.io.NDArrayIter(test_x, None, batch_size=2, last_batch_handle='discard', data_name='X', label_name='Y')

for (outputs, nbatch, batch) in mod.iter_predict(test_iter):
    predicted = outputs[0]
    topk = predicted.topk(k=1, ret_typ='both')

    for idx in range(predicted.shape[0]):
        top_predictions = [(topk[1][idx, j].asscalar(), topk[0][idx, j].asscalar()) for j in range(1)]

The exception message is quite weird, but I found the source of the exeception itself. The problem is that you should move your test data to gpu context. If you change 72 line to:

test_x = nd.sparse.csr_matrix(x, ctx=mx.gpu())

If I do this change, your code starts to work and I receive:

[(1.0, 0.50023055)]
[(0.0, 0.50022453)]
[(1.0, 0.500451)]
[(1.0, 0.5000134)]
[(0.0, 0.50022405)]
[(0.0, 0.50142527)]
[(1.0, 0.49987343)]
[(0.0, 0.5002222)]

My explanation for that would be that during training Module object is smart enough to do copying to GPU itself (your train_x and train_y are also defined in CPU), but probably it is not smart enough to do the same during inference.

In real world situation, when you load data from disk, it would always be located in CPU, so you would need to move it to GPU using as_in_context() method.