How to do multi-gpu training on public SageMaker gluon example?

Hi, I’m training this public gluon example on a p2.16xl notebook

I’m trying to adapt the notebook to run on multi-GPU. In order to do this, I did the following changes:

  1. replace ctx = mx.gpu() by ctx = [mx.gpu(i) for i in range(8)]

  2. replace
    user = user.as_in_context(ctx).reshape((batch_size,))
    item = item.as_in_context(ctx).reshape((batch_size,))
    label = label.as_in_context(ctx).reshape((batch_size,))
    user = gluon.utils.split_and_load(user, ctx)
    item = gluon.utils.split_and_load(item, ctx)
    label = gluon.utils.split_and_load(label, ctx)

it throws the following error: AssertionError: HybridBlock requires the first argument to forward be either Symbol or NDArray, but got <class 'list'>

What am I missing?

just got the answer by a colleague:
" You are correct, the output of split_and_load is a list. You need to iterate over it normally and the asynchronous mxnet engine will take care of the parallelism in the background. For example:"

data_split = mx.gluon.utils.split_and_load(batch[0], ctx_list=ctx, batch_axis=0, even_split=False)
label_split = mx.gluon.utils.split_and_load(batch[1], ctx_list=ctx, batch_axis=0, even_split=False)
outputs = [(net(X), Y) for X, Y in zip(data_split, label_split)]
# loss = ...

Indeed, it is a list. There is a DataParallel model in gluoncv.utils.parallel that hopefully will make its way to the main gluon codebase that will make this a lot simpler to the user.

@Hang_Zhang @zhreshold I can’t find it in the docs on ? But it’s in the code base?