Gluon Dataloader hangs

I am having issues with the gluon dataloader that most of the time it hangs at the start of iteration for a long time on an amazon ec2 DL AMI (ubuntu).

I assume it needs to copy the whole dataset class to each process which makes it slow. Now my dataset contains a list of 200k strings (image paths). Is there a way to speed this up so I dont have to wait like 10-20 minutes at start?

If you’re using a large num_worker, the issue is that all threads start creating batches together and that can create I/O contention. How long does the startup take if you use num_workers=2?

A bit less, but still considerable.

Is your dataset part of the AMI image? Just wondering if this is related to AMI cold start (i.e. AMI image is stored on S3 and block-synched on a read-miss).

It is just a txt file with a bunch of rows in it. Each row corresponds to one entry in my dataset.