Need help writing a custom iterator, libsvm headaches

I could use some help working on a custom iterator for the python api. I have a list of libsvm files that I want to read in and extract certain elements from - to facilitate setting up private/public spaces in a mixed-data and multi-task procedure. I can do this one by one with scipy libsvm loader, but I’m tangled up wrapping the data extraction procedure around a list of files. Is it possible to do this?

x,y = load_svmlight_file(input_file, n_features = 55182)

a = x[:,0].toarray().flatten()
b = x[:,1:211]
c = x[:,212:9568]
d = x[:,9569:55182]

data = {‘a’:mx.nd.array(a), ‘b’:mx.nd.sparse.array(b), ‘c’:mx.nd.sparse.array©,‘d’:mx.nd.sparse.array(d)}
label = {‘autoencoder_label’:mx.nd.sparse.array(b), ‘softmax_label’:mx.nd.array(y)}

train_iter = mx.io.NDArrayIter(data=data, label=label, batch_size=64, shuffle=True, last_batch_handle=‘discard’)

You could you use ArrayDataset and just concatenate the parts of the arrays you want to read:

data = []
data.append(x[:,1:211])
data.append(x[:,212:9568])

train_data        = mx.gluon.data.ArrayDataset(nd.concatenate(data))
train_dataloader  = mx.gluon.data.DataLoader(train_data, batch_size=batch_size, shuffle=True)

You can also implement a customized dataset SVMFolderDataset that inherits from Gluon’s dataset. And then overwrite __getitem___ to only return items with specific indexes e.g.:

   def __getitem__(self, index):
        items_with_index = list(enumerate(self.items))  
	image_index, image_tuple = random.choice(MyIndexes)
	image = super().__getitem__(image0_index)
	
        return image[0]
        

Thanks! But if I go the Gluon route… can I still use the module api?

Unfortunately, Module API doesn’t really work with Dataset/DataLoaders.