Serving different Models in one MMS

Currently, it seems that MMS does support only one service file per model server. Which is the preferred way to service different types (based on pre/post-processing) of models?

  • Is it possible to apply conditional logic in the service file based on the queried endpoint?
  • Can multiple MMS co-exist on a single node separated by Servicing port
  • Should one implement a MultiNode Service class

Thanks for pointers and guidance …

To understand the requirement further, could you specify the use case for requiring multiple service files for the same model?

MMS supports loading and serving multiple models at the same time. Each of these models have their own endpoint for you to run queries. If you are trying to run multiple models, you could use different model service files in each model.

Also, how are you running MMS? Is it the container image or standalone MMS? You could run multiple model-servers on multiple ports, but thats not advised…

My inquiry is exactly what you describe … That is, I would like to have a service file per model.
I have two different types of model:

  1. Classification
  2. Detection

I would like to use a different service file for each. So I do …

mxnet-model-export --model-name model1 --model-path <DIR-MODEL1> --service-file-path <DIR-MODEL1>/<Model1>.py

mxnet-model-export --model-name model2 --model-path <DIR-MODEL2> --service-file-path <DIR-MODEL2>/<Model2>.py

This generates the two different model files (each having a separate model service file). In each model’s model file __init__ method, I included a simple print("Model1") or print("Model2") statement.

Now, I start the server:

mxnet-model-server --models model1=<MODEL1>.model model2=<MODEL2>.model --host <MyHost>

The models get registered as Flask endpoints. When I look through the startup messages, I see that only the __init__ of the first model being registered is executed (e.g., “Model1” is print). Meaning that this __init__ method is run for Model 1 and Model 2.

Later on, when I call the endpoints via curl, I see the behavior that …

CURL MODEL 2 - Exception with ... 

File "/usr/local/lib/python2.7/dist-packages/mms/", line 468, in predict_callback
    response = modelservice.inference(input_data)
  File "/usr/local/lib/python2.7/dist-packages/mms/model_service/", line 105, in inference
    data = self._postprocess(data)
  File "/home/local/.../<MODEL1>.py", line 31, in _postprocess

That is, the service file for MODEL1 is hit.

My suspicion is probably confirmed by this statement:

Note that if you supply a custom service for pre or post-processing, both models will use that same pipeline. There is currently no support for using different pipelines per-model.

Consequently, my question is if and how I can I deal with the above scenario.


We have a new version of MMS on our GitHub. You should definitely check this version out. It’s very flexible and highly scalable compared to the previous version. Do let us know if you would like to check it out.