How to group two models together as one?

I trained two models separately, they have the same input data. In order to obtain the minimum processing delay, how to group the two models in the inference deployment? like this:
I try to use the code:

class ConcateNetwork(gluon.HybridBlock):
    def __init__(self, net1, net2):
        super(ConcateNetwork, self).__init__()
        self.net1 = net1
        self.net2 = net2
    def hybrid_forward(self, F, x):
        output_char1 = self.net1(x)
        output_char2 = self.net2(x)
        return output_char1, output_char2
net = ConcateNetwork(net1, net2)

However, tests have shown that this code still a method of executing two models in sequence. Is there any way to make these two models execute in parallel, just like a model?

I use ray distributed for this kind of task (parallel/distributed inference). You can declare a function (wrap your models in there) and assign specific resources (number of GPUs, number of cores etc).

@zzmig Any update on this?

I am also working on this issue: loading 2 models at once (with Module() class) and running forward pass in parallel on different devices with the same data batch, rather than running in sequence, for better performance with overlapping.

I did some searching but didn’t find any solution. This architecture should be common in knowledge distillation projects, but they all implement it in sequence. I plan to try Python’s multiprocessing.