Dataset .transform mapping over tuples conventions

Hi @lambdaofgod,

So I think the confusion arises from the intended usage of a Dataset. It is used to retrieve a single sample at a time, rather than a range of indexes. So the following usage is correct:

transformed_dataset = dummy_dataset.transform(lambda src, tgt: tgt)
print(transformed_dataset[0])

DataLoader is the class that consumes the Dataset and it only ever retrieves single samples at a time when constructing a batch. As such, although dummy_dataset[:1] appears to work, it’s not intended to be used this way, and things mess up when you add transform into the mix.

I’ve added more information on your other related question.

1 Like