Question regarding ssd algorithm

in the gluon website in the Default anchor boxes part regarding the SSD algorithm it says:

Since an anchor box can have arbituary shape, we sample a set of anchor boxes as the candidate. In particular, for each pixel, we sample multiple boxes centered at this pixel but have various sizes and ratios. Assume the input size is w×h, - for size s∈(0,1], the generated box shape will be ws×hs - for ratio r>0, the generated box shape will be wr√×hr√…

Can anyone explain this to me, maybe with an example please?

Also another question is that how when we detect something is a smaller feature map we then project it to the original input image?

The sizes argument of the MultiBoxPrior defines a set of square bounding boxes. The size is defined as a % of total w and h.

The ratios argument defines a set of ratios to apply to the first sizes element (sizes[0]). As far as I can tell, the first ratio element is ignored, always good to set it to 1 then.

>1 ratios define horizontal rectangles. <1 ratios define vertical rectangles.

For example if you have

sizes = [0.2, 0.5]
ratios = [1, 4, 0.25]

You will get 4 anchor boxes.

  • A square anchor box of [0.2, 0.2] size
  • A square anchor box of [0.5, 0.5] size
  • A rectangular anchor box of [0.1, 0.4] size
  • A rectangular anchor box of [0.4, 0.1] size

(using the code provided in the chapter 8 in object detection and the above values you get the following result)


To answer your second question, for each anchor box, we predict the real w and h in [0,1], and the offsets dw, dh from the center of the anchor box in [0, 1]. That way even if we have only a 16x16 feature map, we can predict a bounding box that is (h,w,dw,dh) (0.123, 0.432, 0.12, 0.23) in the original image for example.

Alternatively, there is another implementation of SSD and Faster-RCNN in pure gluon that you can find available here:
Note they follow different conventions on ratio etc, so have a look at the code directly to understand how the anchor box are calculated.