How to read images into gluoncv from zip file?

I had to read images from the entire zip file because the number of images under a folder exceeds the limit of the Linux system. I used such code

import zipfile
import numpy as np
from PIL import Image
import io
import mxnet as mx

z = zipfile.ZipFile("train.zip", "r")
data = z.read(z.namelist()[0])
data = io.BytesIO(data)
im = Image.open(data)
im = mx.nd.array(im)

It seems that I get the BGR image which is similar with what I get when using mx.image.imread(img_path, 1) in gluoncv. However, I have two questions:

  1. In python3, when using multi gpus, will cause this error:
    zipfile.BadZipFile: Bad CRC-32 for file
    It seems this problem has somethingt to do with multi processing. I must create a new zipfile.ZipFile for a new image to avoid this problem. Any better solution?
  2. I am curious about whether such a reading method will be very slow or not. Is there a better way to deal with this problem in mxnet?

Hi,

if you organize your images in one folder per class, you can use the sample im2rec.py script to load those images in two (huge) .rec files, one for validation and one for training.

You can read a .rec file during training with the mxnet.io.ImageRecordIter class.

This is explained quite well in this example here:
https://gluon-cv.mxnet.io/build/examples_datasets/recordio.html#sphx-glr-build-examples-datasets-recordio-py

This should solve your original problem of not storing too many images in one folder, and the .rec file will replace your .zip file (which doesn’t compress the images anyway).

regards,

Lieven

I use coco2017 dataset for object detection. Are there any examples about how to make detection problem ImageRecordIter?

Hi @zhoulukuan, I’m suprised you’re having issues with number of files. GluonCV has a tutorial for preparing MS COCO data. Give that a try and see if you hit the same issues. Otherwise you could package up as a RecordIO file as suggested by @Igo, there seems to be a good tutorial for that here.