A strange bug when loading image record file

This code raises error at iteration 69, fixed each time.

Actually, I have run a similar code for several times, and haven’t met this error until recently when I used a new “.rec” file, which is larger than the original ones.

At the beginning, I thought the problem lies in data --raw data or generated rec file. But as I tried some different parameters, I find something really strange.

When I change batch size, the program will break down at a different iteration. (not on the same image)

When I change the "data_shape" parameter to (3, 501, 501), it will also raise an error at another different iteration.

If I let it sleep for one second each iteration, it will break down at another iteration.

Now I believe the problem must lie in the mx.io.ImageRecordIter itself. Does anyone have idea about this strange thing?

without sleep:

with sleep:

Hi @igloooo,

I suspect it is the same image causing you the issues here. And it looks like that a smaller image has been included in your RecordIO file. What’s different about this RecordIO file, versus the ones you were using before? Some new images? A different kind of pre-processing? You should check the dimensions of your input images before being packaged in RecordIO format.

Given the ImageRecordIter is running in the background in C++, there won’t be a direct correspondence between your Python loop and the actual file being processed. You could set the environment variable MXNET_ENGINE_TYPE=NaiveEngine (just for debugging purposes) to get a better idea of which image could be causing an issue. And also the batch_size will change the loop you’d expect the error to occur on to some extent.