I’m looking for the functionality of Keras’ timedistributed layer in mxnet’s symbol workflow. For example run data of (batch_size, seq_length, channels, height, width) through a bunch of 2D convolutional layers (that are the same layers with the same states), then pool the results on seq_length at the end.
MXNet does not have a time-distributed layer like Keras, but you can use a Dense layer and set flatten=false. The Dense layer works then the same like Keras’ time-distributed layer.
Thanks for the response, how does this work in the sense of a wrapper? could you give a brief example of its usage in the case where you have a symbol that is wrapped by this dense layer.
I think I have hacked something together for reference if anyone is interested:
class TimeDistributed(HybridBlock):
def __init__(self, model, **kwargs):
super(TimeDistributed, self).__init__(**kwargs)
with self.name_scope():
self.model = model
def apply_model(self, x, _):
return self.model(x), []
def hybrid_forward(self, F, x):
x = F.swapaxes(x, 0, 1) # swap batch and seqlen channels
x, _ = F.contrib.foreach(self.apply_model, x, []) # runs on first channel, which is now seqlen
x = F.swapaxes(x, 0, 1) # swap seqlen and batch channels
return x
Seems to fail on backward when hybridize is used for some reason though.