How to run 2 models together in parallel with the same data input in model training?

wydwww · September 11, 2020, 12:24pm

I am implementing knowledge distillation-based DNN model training, as illustrated in the figure below, to run the teacher and student models (blue and green blocks) in parallel with the same data batch.
My plan is to put a light-weight pre-trained teacher model on CPU which only runs forward pass with frozen parameters. The student model is a large model to be trained on GPU(s).
I suppose moving a light task (teacher’s forward pass) to CPU can make it overlapped with the heavy training task on GPU and make this pipeline faster, compared with 2 models running in sequence as in many knowledge distillation projects (see below).
This task is not for model compression.

I’ve checked some popular repos like NervanaSystems/distiller and peterliht/knowledge-distillation-pytorch. They execute the forward operations of the student and teacher models in sequence (line by line), not in parallel on different devices (GPU or CPU).

I am trying to speed up this training process to run the 2 models at the same time using multiple devices (e.g., loading the small, inference-only model on CPU and not interrupting the GPU training of the heavy model).

What is the proper way to run 2 models (with Module() API of MXNet 1.x) in parallel? Should I use Python multiprocessing library? Any recommendation on how to create a process to load the small teacher model and run forward() with the same data input?

Topic		Replies	Views
Forward-backward pass being a bottleneck in multi-gpu training	3	1052	July 12, 2019
Documentation Request: Model Parallelism Tutorial Performance	6	1849	March 10, 2018
How to train models with multiple gpus in C++	3	751	November 6, 2018
How to train a model written in mxnet/gluon on multiple workstations？ Gluon	1	436	September 19, 2018
How to speed up the train of neural network model with mxnet? Performance	12	3077	August 10, 2018

How to run 2 models together in parallel with the same data input in model training?

Related Topics