How to consolidate the weight matrix from model parallelism?


The group context can be leveraged to implement the model parallelism. A common scenario is to split weight matrix into sub-matrix and distribute among different GPUs. However the generated model will also be represented with the sub-matrix distributed on different GPUs. This will lead to the requirement of multi-GPU for prediction scenario. Is there a way to consolidate the model by merging split matrix such that prediction can be done in single GPU?

You can set context to the same gpu for all groups

Just modify the checkpoint json file right? The params file doesn’t need to be modified?