Object Detection image collection and bounding box best practices?

Hi all,

I’ve build a few image detection models using SSD on amazon sagemaker and building on pre-trained models. I’ve learned a lot, but I’m still just guessing when it comes to image size and resolutions, as well as bounding box placement.

For example say I wanted to detect solar panels in an image show from a drone, what would be the best approach? Lots of shots all from a similar heights, lots of varying heights, direct over the top, many various angles etc?

Then once I have the shots what about bounding box placement? Should the boxes be run as close to the sides of the panels as possible or should there be a slight border? Is it better to avoid overlapping boxes, or doesn’t that matter etc?

Is anyone aware of some resources that cover these topics in any details? I’d love to get a better understanding of the recommended best practices rather than continue down as path that may be generating sub optimal results.

Many thanks!

Generally it is very important in Deep Learning to have training data that is representative for your given problem. So if you want to detect solar panels, your input data should contain images from all possible positions, heights and weather conditions. Otherwise your model will be subject to bias.
Regarding image size and resolution: The smaller the image, the faster will be your training but the accuracy will likely drop, because your image will loose information. So depending on how large the resolution of your input images is, you should do cropping and/or resizing of images. It also depends how large the objects are that you want to detect (few pixels or hundreds of pixels). Overlapping boxes should not matter.
The following article explains quite nicely how objected detection in SSD and Faster RCNN works: https://tryolabs.com/blog/2017/08/30/object-detection-an-overview-in-the-age-of-deep-learning/

1 Like

Thank you for the reply.

So lets say I have a large number of training images, and they are quite high resolution, but they are all shot from a similar height. If after training I then receive a shot from a higher altitude, will the model still be capable of recognizing the panels, even though they will appear smaller than the training set? Or will the model go “hey, I know they are smaller than what I’ve seen before, but that still looks like a solar panel to me”!

It depends on the problem and the training data. I worked on a project related to fault detection some time ago, where I used Faster R-CNNs and I experienced such problems, once I changed the resolution of my input images.
But it also important to understand, based on which features your model makes decisions. Here is an interesting article: https://flyyufelix.github.io/2017/04/16/kaggle-nature-conservancy.html where they use CNNs to identify fishes but what the model learned was to focus on features related to the boat rather than the fishes. See image below:

So for instance in your case the model may learn that solar panels are always rectangular, but what happens if you do have panels of different forms.

MXNet has the gradcam module which allows you to visualize the predictions made by convolutional neural networks using Gradient-weighted Class Activation Mapping.

You can check this tutorial: https://mxnet.incubator.apache.org/tutorials/vision/cnn_visualization.html

1 Like