Multiple dataloader will slow the training performance

I have multiple record files, and I use ImageRecordDataset and DataLoader to load data, So I have multiple DataLoaders.

In a training epoch, I will iter over these DataLoaders. But I found that when I load over a dataloader, load next dataloader will cost a lot of time.

Could you create a small reproducible example to show where exactly it is so slow?

Also did you set num_workers parameter to some bigger value? If you are on Mac or Linux, then setting this parameter to something like multiprocessing.cpu_count() - 3. This should increase loading speed a lot.