Object Detection and Non-annotated data in RecordIO


We have data with annotated bounding boxes (‘sick’) and data that has no bounding boxes at all (‘healthy’).

Our objective is to train the algorithm with both as the ‘healthy’ data would allow for a more robust model.

We have followed the convention provided in mxnet’s documentation, an example of healthy and sick in our list currently looks like this:

2 2 0 whole/healthy/R40IMG30.jpeg
3 2 5 0 0.0004060000000000036 0.097402 0.22037400000000001 0.22849000000000003 whole/sick/R117IMG245.jpeg

Whereby the first line represents a healthy example with no bounding boxes and the second an example with a bounding box around the sick area.

In the first line the length of label is 0 (we tried 5 as well), while in the second line the label length is 5.

We’re getting this error when starting the training instance: “Not enough annotation packed in the list file or the RecordIO file. Each object requires at least five numbers [label_id, xmin, ymin, ymax] for annotation.”

We understand what the error means, and are wondering how we’d go about inserting data with no annotations via RecordIO format.

Object Detection models require that images have at least one bounding box. As I understand you want to perform object detection on images that contain ‘sick’ parts, but there are images that do not contain anything which means they are classified as ‘healthy’. That means that your object detection model can only be trained on ‘sick’ data.

You could for instance train a classifier that distinguishes whether an image is healthy or not and if not, then run a detection model to find where the ‘sick’ part is located. Whether this approach gives better results mainly depends on how your data looks like e.g. how large are typically the annotations? how difficult is it to distinguish ‘healthy’ and ‘sick’?

Thank you for your reply. Although having a two-step process might work, it doubles the development and debugging time.

I was looking for something similar to what is discussed in this issue, but for mxnet:

The main reason we want to add negative photos (non-annotated data) is to increase the robustness of our model to false positives, as currently the model is learning the wrong features off our annotations (its learning features that are common across both positive and negative photos). By introducing negative photos, we hope to make the model learn the unique features.