Cuda malloc when going distributed

I think I found the source of the error, leaving it here for reference. I think it relates to issue #14136. What I did to resolve it is add these two lines in the gluon.data.DataLoader

# Load the training data
train_data = gluon.data.DataLoader(dataset_train,
                                   batch_size,
                                   sampler=SplitSampler(len(dataset_train), store.num_workers, store.rank),
                                   # *****************************
                                   pin_memory=True,
                                   pin_device_id = store.rank,
                                   # *******************************
                                   last_batch='discard',
                                   num_workers = num_cpus)

# Load the test data 
test_data = gluon.data.DataLoader(dataset_val,
                                  batch_size_per_gpu,
                                  shuffle=False,
                                  last_batch='discard',
                                  # ******** new test ************
                                   pin_memory=True,
                                   pin_device_id = store.rank,
                                  # *******************************
                                  num_workers = num_cpus)

I basically pinned the memory and gave a different rank for each worker (I think!). I don’t know how this will work when going to the validation phase, we’ll see. But I can train, without cuda malloc error (without horovod at the moment, getting there …).

1 Like