Reading images fast: list comprehension vs for loop

Hi,

I’m looking for the fastest way to read a folder of same-size images into an NDArray. Surprisingly, using a for loop of concats is 3x faster than doing a list comprehension. Any idea why? Any suggestion of fast technique for that?

Idea 1: For Loop of concats (100ms)

ims = (mxim.imread(batch_path + '/' + piclist[0])
       .expand_dims(0)  # Create an extra dim for the concat
       .as_in_context(ctx))

for picname in piclist[1:]:
    pic = mxim.imread(batch_path + '/' + picname).expand_dims(0)
    ims = nd.concat(ims, pic.as_in_context(ctx), dim=0)
    
nd.waitall()

Idea 2: list comprehension (320ms)

ims = nd.concat(
    *[mxim.imread(batch_path + '/' + pic).expand_dims(0) for pic in piclist],
    dim=0).as_in_context(ctx)

nd.waitall()

Turns out the list comprehension on the GPU is actually even faster (vs concatenating on CPU and sending the whole concat after)

85ms:

ims = [mxim.imread(batch_path + '/' + pic).expand_dims(0).as_in_context(ctx) for pic in piclist]
ims = nd.concat(*ims, dim=0)