How does mxnet allocate computation tasks to the devices

We know that the computation tasks are conducted by the dependency engine, however, for the code in base_module.py, as shown in the following figure.01. How does mxnet allocate computation and communication tasks to the devices in each iteration? Does the allocation of tasks in the next iteration require the completion of all tasks in the last iteration?

Operations from the frontend language (e.g. Python) are queued for processing by the backend engine. Computation location will depend on the location of the input data for a given operator (i.e. CPU or specific GPU device). Computation ordering will depend on the dependencies, and this also enables parallel processing (if certain operators are independent).

It depends. If operations in one iteration depend on all computation from the previous iteration, then yes. One example would be training a neural network that updates its parameters after each batch iteration. You could certainly have cases where this isn’t the case though. You can keep queuing operations from the frontend though, even though the backend hasn’t finished processing the previous iterations. And often it’s a good idea to have a blocking operation in the frontend (such as updating a metric or logging out the loss) after each iteration to prevent too many operation being queued (and running out of memory).

1 Like

Thank you, thomelane. After reading the code about implementing the dependency engine in MXNet, I figure out the relationships between different iterations. It is exactly what you are talking about, and thanks again for your kindness.