Hi,
I’m using mxnet.io.ImageRecordIter, which is fast yet not very flexible…
During training I shuffle the batches, for evaluation I want to get specific batch (or images) for visualizations.
In https://mxnet.incubator.apache.org/api/python/io/io.html#mxnet.io.ImageRecordIter
I can only find instantiation params, and there is no reference to the source code.
Pointers anyone ?
-Oron
The ImageRecordIter is implemented in C++, you can find the source code here
I would suggest using the Dataset
and Dataloader
APIs as they are much more flexible. Instead of multi-threading it uses a multi-processing paradigm. Check this tutorial out to learn more, or the API docs
Overall I would suggest splitting your training and evaluation datasets by creating two Datasets
and Dataloader
, or in your case two ImageRecordIter
, one for training and one for evaluation. So that you can specify shuffle=True
on the training one, but not on the testing one. And you control what goes in the evaluation dataset in the first place.
Does that answer your question?
1 Like
Thanks!
P.S
I’ve compared the gluon DataLoader and it turned out to be about 4x slower…
Have you set your Dataloader parameter to num_workers=multiprocessing.cpu_count()
?
The 4 times slower seems on point because by default the ImageRecordIter uses 4 threads whilst the DataLoader use a single worker, or more precisely, does not use workers and the processing is done in your current process.