Need help writing a custom iterator, libsvm headaches

dscantle · May 1, 2019, 12:13am

I could use some help working on a custom iterator for the python api. I have a list of libsvm files that I want to read in and extract certain elements from - to facilitate setting up private/public spaces in a mixed-data and multi-task procedure. I can do this one by one with scipy libsvm loader, but I’m tangled up wrapping the data extraction procedure around a list of files. Is it possible to do this?

x,y = load_svmlight_file(input_file, n_features = 55182)

a = x[:,0].toarray().flatten()
b = x[:,1:211]
c = x[:,212:9568]
d = x[:,9569:55182]

data = {‘a’:mx.nd.array(a), ‘b’:mx.nd.sparse.array(b), ‘c’:mx.nd.sparse.array©,‘d’:mx.nd.sparse.array(d)}
label = {‘autoencoder_label’:mx.nd.sparse.array(b), ‘softmax_label’:mx.nd.array(y)}

train_iter = mx.io.NDArrayIter(data=data, label=label, batch_size=64, shuffle=True, last_batch_handle=‘discard’)

NRauschmayr · May 2, 2019, 6:07am

You could you use ArrayDataset and just concatenate the parts of the arrays you want to read:

data = []
data.append(x[:,1:211])
data.append(x[:,212:9568])

train_data        = mx.gluon.data.ArrayDataset(nd.concatenate(data))
train_dataloader  = mx.gluon.data.DataLoader(train_data, batch_size=batch_size, shuffle=True)

You can also implement a customized dataset SVMFolderDataset that inherits from Gluon’s dataset. And then overwrite __getitem___ to only return items with specific indexes e.g.:

   def __getitem__(self, index):
        items_with_index = list(enumerate(self.items))  
	image_index, image_tuple = random.choice(MyIndexes)
	image = super().__getitem__(image0_index)
	
        return image[0]

dscantle · May 13, 2019, 5:06pm

Thanks! But if I go the Gluon route… can I still use the module api?

Sergey · May 13, 2019, 9:34pm

Unfortunately, Module API doesn’t really work with Dataset/DataLoaders.

Topic		Replies	Views
Dataloader Iterator breaks and I have no idea why? Discussion	1	396	June 12, 2019
Neural network for regression with multiple output Discussion	0	1330	April 5, 2018
Return variable number of ndarray in __getitem__ in custom dataset Gluon python , gluon , how-to	1	687	February 11, 2019
mxnet.io.CSVIter delimiter	2	344	October 17, 2019
DataBatch index field, random shuffling and custom iterators Discussion	2	1086	November 27, 2017

Need help writing a custom iterator, libsvm headaches

Related Topics