Load 2 data-sets in a same order

729009982 · October 25, 2018, 2:58pm

Hi.
I am writing to load 2 data-set by VOCDetection from gluoncv ,But I want to read 2 data-sets (same num,size) in a same order .Due to the shuffle strategy ,I can not ensure that.
And I want to know if I fix the random seed would works.
If works,and whether or not this solution would heart the performance.

What should I do ? Thank you for your time.

safrooze · October 25, 2018, 9:34pm

If you’re using the gluon DataLoader, there is a shuffleparameter that disables the shuffling.

feevos · October 26, 2018, 3:47am

Hi @729009982,

what is the format of your dataset? How do you read the corresponding datasets into memory? There are many solutions. One is to read both datasets and bind them together under a common subclass of gluon.data.Dataset, that is for a single index, idx, get the corresponding pair of data out. E.g.

Class MyDoubleDataset(gluon.nn.Dataset):
    def __init__(self, some_arguments):


          # In here I ASSUME that the way you read your data keeps them in the corresponding order
          self.dataset1 = # ... some read statements, e.g. example numpy array of shape (1000,5)
          self.dataset2 = # .... some other read statements. , e.g. example numpy array of shape (1000,5)

    def __len__(self):
         return self.dataset1.shape[0]
 
    def __getitem__(self,idx):
         # This now preserves the order between the datasets inside a DataLoader object
        return self.dataset1[idx], self.dataset2[idx]

Hope this helps.

729009982 · October 26, 2018, 7:01am

Hi @feevos,
I try to write under your instructions that is highly inspiring. My dataset are totally similar to the VOC-dataset except the image-data are doubled.One is original(clean),the other is added some noise. I want to the function return :clean_img,noise_img,label(bbox and cls_id,just like the VOCdetection())
then I could get :
clean_data = batch[0]
noise_data=batch[1]
cls_targets=batch[2]
box_targets=batch[3]
I try to modify follow your way and VOCdetection:

def getitem(self, idx):

img_id = self._items[idx]
clean_img_path = self.clean_image_path.format(*img_id)
noise_im_path = self.noise_image_path.format(*img_id)
label = self._label_cache[idx] if self._label_cache else self._load_label(idx)
clean_img = mx.image.imread(clean_img_path, 1)
noise_img = mx.image.imread(noise_img_path,1)
if self._transform is not None:

    return self._transform(img, label)

return clean_img,noise_img, label

but get return self._fn(*item)
TypeError: call() takes 3 positional arguments but 4 were given
(_fn is a data transformation function) .I am trying to solve it or in other way,I would appreciate it if I could receive your reply.
And thank you for your time and consideration

feevos · October 26, 2018, 7:31am

Hi @729009982,

I am sorry, I’ve only worked on semantic segmentation so far (getting to instance segmentation as well ), I understand your problem includes also bounding boxes? (hence the transform on the label?). So you’ll probably need to write custom transformations too, that are deterministic, i.e. you’ll provide the random number manually, the same for all three objects (img_clean, img_noisy, label). Where by label I understand is something different, like BBox information too?

One thing I note in the code is that the transform is being applied to some img, label however nowhere these are declared in the ___getitem__(self,idx) (emphasis on double underscore), function. I don’t know which transform you are using, but I assume you should apply a deterministic transform to both the noisy and clean images. Something like

class MyDoubleDataset(gluon.nn.Dataset):
    def __init__(self, some_arguments):

       self._items = #something
       # Read dataset, create in anyway you want the corresponding noisy images 
    def __len__(self):
           # ....... as before

    # don't forget double underscore!
    def __getitem__(self, idx):

        img_id = self._items[idx]
        clean_img_path = self.clean_image_path.format(*img_id)
        noise_im_path = self.noise_image_path.format(*img_id)
        label = self._label_cache[idx] if self._label_cache else self._load_label(idx)
        clean_img = mx.image.imread(clean_img_path, 1)
        noise_img = mx.image.imread(noise_img_path,1)

         # Here is a problem 
        if self._transform_img is not None: # Maybe add in the if statement also transform for label? I assume its bbox?
            random_number = # Get here some random number 
            trans_img_clean = self._transform_img(clean_img, random_number)
            trans_img_noise = self._transform_img(noise_img, random_number)
            trans_label = self._transform_label (label,random_number)
            return trans_img_clean, trans_img_noise, trans_label

        return clean_img, noise_img, label

Also make sure that now you are spitting out 3 outputs, so you’ll need to change the definitions on how you use this in your network:

datagen = gluon.data.DataLoader(dataset, ...., shuffle = True, ...)
for i, data in enumerate(datagen);
      batch_clean, batch_noise, batch_label = data
      # do stuff with network, I don't know how you feed into network images. 
      # something like? 
      out_clean = net(batch_clean)
      out_noise = net(batch_noise) # ? 
      break

does this help?

729009982 · October 26, 2018, 7:53am

I got your idea! TTTTTTTTThx

Topic		Replies	Views
Load multiple rec files for shuffling and training Discussion	3	1092	September 26, 2019
mxnet.gluon.data.DataLoader / RandomSampler shuffle Gluon	2	869	May 2, 2018
Does seeding Random expected to seed DataLoader as well? Gluon	2	1140	April 23, 2018
How to wrap dataloader/iterator around glucv.data.pascal_voc.detection.VOCDetection?	1	581	October 28, 2019
Distributed training questions Gluon	28	5139	January 11, 2021

Load 2 data-sets in a same order

Related Topics