What is wrong with this Neural network? I have visualization of error over epochs and mae. Am I using wrong error function?

Network Type: Loss functions

SSD_ANCHOR_SIZE = [[0.03], [0.04], [0.05], [0.06], [0.07], [0.08], [0.085], [0.09], [0.095], [0.1], [0.11], [0.12],
[0.13], [0.14], [0.15]]
SSD_ANCHOR_RATIO = [[1, 1.1, 1.2], [1, 1.5, 1.75, 2], [1, 2.5, 3],[1, 3.5, 4, 4.5], [1, 4.5, 5, 5.5, 6], [1, 6.5, 7, 8],
[1, 7, 7.5, 8], [1, 8, 9], [1, 8, 8.5, 9], [1, 9, 9.5, 10], [1, 10, 10.5, 11, 12],
[1, 10, 11, 12, 13, 14], [1, 12 ,13, 14, 15, 16, 17], [1, 16, 17, 18, 19], [1, 17, 17.5, 18, 19]]

This size and anchors forms bounding boxes having width ranging from 40-1000px and height of 50px.

Loss Chart:
Loss = L1 Loss + Class loss
where L1 loss is calculated on bounding box and class is calculated on the softmax cross entropy basis.
MAE = Mean Absolute Error


Results Visualized After 100 Epochs

Green boxes are the boxes that were predicted and red are the actual boxes. My default bounding boxes have width as an element in it as well, so I am not getting why don’t I see any small height and width bounding boxes and I only see rectangular bounding boxes?

All bounding boxes seem to get started from left top corner as well… so I am wondering what could be an issue here.


It is hard to tell what went wrong without looking into the code.

It seems like you want to do a print text detector. There is an example from a colleague of mine, who did similar thing, but for hand written text. Take a look how he does it, and maybe you will spot an error in your own code: https://github.com/ThomasDelteil/HandwrittenTextRecognition_MXNet

1 Like

Hi Sergey,

I followed similar approach as that but with different anchor size because line text height in printed document is considerably smaller compared to the line height for handwritten text documents. So, I created my own new anchor size and ratios each having height size ranging from 45.0 to 55.0 and width ranging from 50 px to 900 px.

Each boxes generated are either horizontally stretched or a square. But the proposed boxes here are mostly vertically stretched all starting from more or less same position. That is bit strange to me.

Also the boxes are not moving away from the one point to other. Hence, I have few hypothesis that either L1 loss is not working, or I have messed up on of the parameter in my training that is not causing the gradient to flow back properly.

I was wondering if you could provide me some direction regarding some ways to debug this in more systematic way or good practices. It would be helpful. For example, what would be the first few things that you would try in such situation? Something like that would be very helpful.