AFAIK nd.waitall should be used when you want to benchmark of evaluate performance. Say you have a block of code where you pass data through a model and you want to measure how long the code takes to run. Because mxnet functions will asynchronously queue operations to the engine and return immediately, if you put a time guard around your block of code, you may be only measuring how long it takes to enqueue the operations instead of how long it takes. So you should use nd.waitall to ensure that the operations are completely executed in the time guard. Outside of this use case, my understanding is that you don’t really need to use nd.waitall
In some cases when the enqueue operation takes a large amount of memory like passing a large batch of data to the network, and then it starts to pass another batch, while the MXNET engine has not finished processing the first batch, this scenario could result in RAM overflow. You can use the waitall function after passing each batch of data to prevent this problem.