Async updates with Gluon Trainer with multiple devices on one node

Is it possible to perform gradient updates asynchronously à la dist_async mode of kvstore but on a single machine with multiple GPUs?
My understanding is that a call to gluon.Trainer.step when kvstore=‘device’ will wait for all the gradients to be available from all devices before performing the update.

There’s no way to do this currently. Are your devices of different types? If not, all gradients should be available at almost the same time.

All my devices are the same, so like you said gradients are available at almost the same time.
It is less of a wall clock time performance consideration here but more of a mean of being able to use async updates to converge to a potentialy better solution in some cases.
In a sparse setting for example - recommender system - the chance of having the devices updating the same weights are so tiny that it is wasteful to aggregate the gradients and sync them instead of just performing the updates locally on the device as soon as they are available. A sync could only be needed periodically between the devices to combine their gradient by averaging for example.

1 Like