How to wrap dataloader/iterator around glucv.data.pascal_voc.detection.VOCDetection?

https://gluon-cv.mxnet.io/build/examples_datasets/pascal_voc.html#sphx-glr-build-examples-datasets-pascal-voc-py

referring to the above link to create pascal dataset and copied the code below

from gluoncv import data, utils
from matplotlib import pyplot as plt
from mxnet.gluon import data as gdata

train_dataset = data.VOCDetection(splits=[(2007, 'trainval')])
print('Num of training images:', len(train_dataset))

I am certain that train_dataset is built correctly since I can plot the image and the bounding box using the plotting code from the link above

gluoncv.data.pascal_voc.detection.VOCDetection is the type of train_dataset

I am trying to build an iterator out of train_dataset

train_dataset = data.VOCDetection(splits=[(2007, 'trainval')])
print('Num of training images:', len(train_dataset))
print(type(train_dataset))
batch_size = 32
num_workers=1
train_loader = gdata.DataLoader(train_dataset,
                                batch_size=batch_size,
                                num_workers=num_workers, 
                                shuffle=True,
                                last_batch="rollover")

for img, label in train_loader:
    print(f'img: {img.shape}')
    print(f'label: {label.shape}')
    break

Getting error below:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/opt/conda/lib/python3.6/site-packages/mxnet/gluon/data/dataloader.py", line 400, in _worker_fn
    batch = batchify_fn([_worker_dataset[i] for i in samples])
  File "/opt/conda/lib/python3.6/site-packages/mxnet/gluon/data/dataloader.py", line 147, in default_mp_batchify_fn
    return [default_mp_batchify_fn(i) for i in data]
  File "/opt/conda/lib/python3.6/site-packages/mxnet/gluon/data/dataloader.py", line 147, in <listcomp>
    return [default_mp_batchify_fn(i) for i in data]
  File "/opt/conda/lib/python3.6/site-packages/mxnet/gluon/data/dataloader.py", line 151, in default_mp_batchify_fn
    ctx=context.Context('cpu_shared', 0))
  File "/opt/conda/lib/python3.6/site-packages/mxnet/ndarray/utils.py", line 146, in array
    return _array(source_array, ctx=ctx, dtype=dtype)
  File "/opt/conda/lib/python3.6/site-packages/mxnet/ndarray/ndarray.py", line 2504, in array
    arr = empty(source_array.shape, ctx, dtype)
  File "/opt/conda/lib/python3.6/site-packages/mxnet/ndarray/ndarray.py", line 3955, in empty
    return NDArray(handle=_new_alloc_handle(shape, ctx, False, dtype))
  File "/opt/conda/lib/python3.6/site-packages/mxnet/ndarray/ndarray.py", line 140, in _new_alloc_handle
    ctypes.c_int(int(_DTYPE_NP_TO_MX[np.dtype(dtype).type])),
KeyError: <class 'numpy.object_'>
"""

The above exception was the direct cause of the following exception:

Hi @NakedKoala,

Have a look at how it’s done in the training script here

relevant snippet:

def get_dataloader(net, train_dataset, val_dataset, data_shape, batch_size, num_workers, args):
    """Get dataloader."""
    width, height = data_shape, data_shape
    batchify_fn = Tuple(*([Stack() for _ in range(6)] + [Pad(axis=0, pad_val=-1) for _ in range(1)]))  # stack image, all targets generated
    if args.no_random_shape:
        train_loader = gluon.data.DataLoader(
            train_dataset.transform(YOLO3DefaultTrainTransform(width, height, net, mixup=args.mixup)),
            batch_size, True, batchify_fn=batchify_fn, last_batch='rollover', num_workers=num_workers)

The key is that the VOC dataset has different image size, so you need to resize them, or pad them, or crop them, if you want to stack them in your dataloader. The other issue is that the label comes in as numpy array, and are of different shapes as well, so you need to handle that too.

Here’s your code, modified, though not sure that will work with your network:

from gluoncv import data, utils
from matplotlib import pyplot as plt
from mxnet.gluon import data as gdata
import mxnet as mx
from gluoncv.data.batchify import Tuple, Stack, Pad

train_dataset = data.VOCDetection(splits=[(2007, 'trainval')])
print('Num of training images:', len(train_dataset))

# Tell it to Pad the labels
batchify_fn = Tuple(Stack(), Pad(axis=0, pad_val=-1))

# Transform the label to ndarray
def transform(data, label):
    return data, mx.nd.array(label)

# Crop the images so that they are the same size
resize_transform = gdata.vision.transforms.CenterCrop((375,500))

train_dataset = data.VOCDetection(splits=[(2007, 'trainval')])
print('Num of training images:', len(train_dataset))
print(type(train_dataset))
batch_size = 32
num_workers=1

train_loader = gdata.DataLoader(train_dataset.transform(transform).transform_first(resize_transform),
                                batch_size=batch_size,
                                num_workers=num_workers, 
                                batchify_fn=batchify_fn,
                                shuffle=True,
                                last_batch="rollover")

for img, label in train_loader:
    print(label.shape)
    print(f'img: {img.shape}')
    print(f'label: {label.shape}')
    break
Num of training images: 5011
<class 'gluoncv.data.pascal_voc.detection.VOCDetection'>
(32, 17, 6)
img: (32, 500, 375, 3)
label: (32, 17, 6)