Print detected objects in object detection algorithm

I am using Gluon’s pretrained object detection model. I was wondering if there is a way to gather the data for what objects were detected by the algorithm. I want to write a computer program that will do something based on what objects are detected in the photo, without me seeing the photo. I tried some print statements of class_IDs, scores, class_IDs[0] etc. but none give much useful information. Interestingly, printing class_IDs prints a nested list composed of numbers 0 or -1 inside individual brackets. Online, it says the class_IDs variable holds the predicted class IDs detected by the model… that doesn’t seem to align. If anyone has any advice please let me know! Thank you!

Hi @cngluon,

Yes, this is certainly possible. Can you link to the specific model you’re using here? I presume it’s from GluonCV. One of the most important factors when using pre-trained models (without fine-tuning) is the dataset that was used for pre-training, since that determines the number and type of classes detected by the model. I recommend model’s pre-trained on COCO.

When you run the network on an image, a tuple of 3 arrays will be returned which, as you correctly point out, will be #1 class ids, #2 scores, and #3 bounding boxes. You can use the class ids array to determine the class of the object. An image with 3 detected objects, might return something like…

[5, 14, 3, -1, -1, -1, ...]

Our model predicts objects with class indexes of 5, 14 and 3. -1 is just use to pad the array when no more objects have been found. You can use net.classes (i.e. the classes property of your network) to get a list of class labels and then use this to find out what class has been detected by the model: e.g. print(net.classes[5]).

Hi @thomelane! Thank you so much for your help. I fully agree with what you’re saying and would expect the same to happen. I run into confusion when running this gluon cv objects detection tutorial: https://gluon-cv.mxnet.io/build/examples_detection/demo_faster_rcnn.html . When I write the line print(box_ids) on the second to last line (before plt.show()) in the code given in the tutorial, I get this array in return:
[[[-1.]
[-1.]
[-1.]

[-1.]
[-1.]
[-1.]]]
<NDArray 1x6000x1 @cpu(0)>

This would imply that there are no detected objects which is not the case. I notice that this is a nested array, which seems not logical either. Let me know what your thoughts, I really appreciate your help!

Hi @cngluon

Please try this code.

import gluoncv
from gluoncv import model_zoo, data, utils

net = model_zoo.get_model('faster_rcnn_resnet50_v1b_voc', pretrained=True)

im_fname = utils.download('https://github.com/dmlc/web-data/blob/master/' +
                          'gluoncv/detection/biking.jpg?raw=true',
                          path='biking.jpg')
x, orig_img = data.transforms.presets.rcnn.load_test(im_fname)

box_ids, scores, bboxes = net(x)
n_box = box_ids.shape[1]

for n in range(n_box):
    if box_ids[0][n].asscalar() != -1:
        print('id = %d, score=%f'%(box_ids[0][n].asscalar(), scores[0][n].asscalar()))