http://en.diveintodeeplearning.org/chapter_computer-vision/anchor.html

After that, we only need to traverse the remaining anchor boxes of A2,A5,A7,A9 and determine whether to assign ground-truth bounding boxes to the remaining anchor boxes according to the threshold.

Why A2, A5,A7,A9?

I think it should be the anchors except the A2,A5,A7,A9. Also the matrix A should only eliminate the row in the mean while keep the column values. If you try to reimplement the code, you will found out.

```
labels = contrib.nd.MultiBoxTarget(anchors.expand_dims(axis=0),
ground_truth.expand_dims(axis=0),
nd.zeros((1, 3, 5)))
```

I do not understand how is the third parameter used. We are only labeling training anchor box with ground truth bounding box, correct? How is the third parameter used?

"

construct random predicted results with a shape of (batch size, number of categories including background, number of anchor boxes)

"

Quoting the text, we are just adding label why do we need to construct random predicted results ?

we sort the prediction bounding boxes with predicted categories other than background by confidence level from high to low, and obtain the list 𝐿 . Select the prediction bounding box 𝐵1 with highest confidence level from 𝐿 as a baseline and remove all non-benchmark prediction bounding boxes with an IoU with 𝐵1 greater than a certain threshold from 𝐿 .

What happens if the cat in the example is in front of the dog in such a way that it’s IoU with the dog bbox is greater than the threshold? Does the cat bbox get’s removed from L? Or this removal only works on objects of the same category? (But this would fail if we have one dog in front of another…)