Need help writing a custom iterator, libsvm headaches

I could use some help working on a custom iterator for the python api. I have a list of libsvm files that I want to read in and extract certain elements from - to facilitate setting up private/public spaces in a mixed-data and multi-task procedure. I can do this one by one with scipy libsvm loader, but I’m tangled up wrapping the data extraction procedure around a list of files. Is it possible to do this?

x,y = load_svmlight_file(input_file, n_features = 55182)

a = x[:,0].toarray().flatten()
b = x[:,1:211]
c = x[:,212:9568]
d = x[:,9569:55182]

data = {‘a’:mx.nd.array(a), ‘b’:mx.nd.sparse.array(b), ‘c’:mx.nd.sparse.array©,‘d’:mx.nd.sparse.array(d)}
label = {‘autoencoder_label’:mx.nd.sparse.array(b), ‘softmax_label’:mx.nd.array(y)}

train_iter =, label=label, batch_size=64, shuffle=True, last_batch_handle=‘discard’)

You could you use ArrayDataset and just concatenate the parts of the arrays you want to read:

data = []

train_data        =
train_dataloader  =, batch_size=batch_size, shuffle=True)

You can also implement a customized dataset SVMFolderDataset that inherits from Gluon’s dataset. And then overwrite __getitem___ to only return items with specific indexes e.g.:

   def __getitem__(self, index):
        items_with_index = list(enumerate(self.items))  
	image_index, image_tuple = random.choice(MyIndexes)
	image = super().__getitem__(image0_index)
        return image[0]

Thanks! But if I go the Gluon route… can I still use the module api?

Unfortunately, Module API doesn’t really work with Dataset/DataLoaders.