Configuring the number of vCPUs to be used by MxNet prediction running in Gunicorn worker process

I am running MxNet inference on CPU. I am using 1 Gunicorn Gevent worker nodes each loading a copy of the model during startup. When running inference I observe that they utilize only 1 of the 16 vCPUs(c4.4xl host). I have changed the number of worker processes and have a similar observation for different number of worker processes. ie for 2 worker process utilize just 2 vCPU, 4 worker processes utilizing just 4 vCPUs and so on.

  1. Is this behaviour expected? I have MKL installed and would expect mxnet to utilize more CPU cores since they are available.
  2. What would be needed for these 2 workers to use 8vCPUs each or 4 workers to use 4vCPUs each? Would it be some configuration needed for the Gunicorn worker or some tweak in the MxNet/MKL variables?