Trainer.step doesn't block, is this safe, is this faster?

Hi,

This is the default execution mode of MXNet. All operations are added to an execution Stream and therefore scheduled to be executed asynchronously (and the graph dependencies are conserved).

Trying to access resulting NDarrays in Python (with wait_for_all or asnumpy) just makes the frontend wait for results and this does not impact the execution. Other that the fact than you can’t schedule further operations since Python is waiting.

You can also find info about it in this thread: