Speed Issue converting NDarray to np.array

Hi there,

I am using the Gluon package to run a Faster RCNN Coco-trained model (Link here - this is for ssd but I’m using a FRCNN model).

The issue I’m having is that when I convert the output arrays for bounding_boxes, scores & class_IDs it takes a very long time. This is because the mxnet engine is asynchronous, and the system needs to finish computations before converting the array. This time factor is killing me as we have thousands of images to perform detentions on, each taking about 1-2 seconds.

The FasterRCNN output gives us an 80000 element array, although only the first 10 are needed. Slicing the array to be the first 10 and using .asnumpy() still takes a long time though, because it is still computing the other elements…

I can only think of two solutions to speed up the code:

  1. Make the initial output of the FRCNN network shorter eg max ouput of array length 10
  2. Use environment variables to change the engine to a synchronous engine (Under “Engine Type”)
  3. Somehow find a way to make asnumpy() faster

Can anyone help?

What is your current batch size? Why do you have an 80000 element array? How many classes are you trying to predict?

asnumpy() is a blocking call so the execution will be stopped until the result can be retrieved. Try to avoid that call and use instead MXNet NDarray calls: https://mxnet.incubator.apache.org/api/python/ndarray/ndarray.html For instance if you are only interested in the objects with highest probabilities, then you can use ndarray.argmax https://mxnet.incubator.apache.org/api/python/ndarray/ndarray.html#mxnet.ndarray.NDArray.argmax

There are several other options to speed up the inference:

  • run inference on large image batches to increase throughput
  • optimize your model with TensorRT

Here are some useful links:

Thanks NRauschmayr,

Below is an example of the code from the gluon website.

from matplotlib import pyplot as plt
import gluoncv
from gluoncv import model_zoo, data, utils
net = model_zoo.get_model('faster_rcnn_resnet50_v1b_voc', pretrained=True)

im_fname = utils.download('https://github.com/dmlc/web-data/blob/master/' +
x, orig_img = data.transforms.presets.rcnn.load_test(im_fname)

box_ids, scores, bboxes = net(x)
ax = utils.viz.plot_bbox(orig_img, bboxes[0], scores[0], box_ids[0], class_names=net.classes)


box_ids, scores and bboxes all return 80000 element arrays. However, only the first 6 are valid scores, the rest have scores of -1.

Can I add a parameter (batch size) so that less elements are returned and less computations are needed?