Inconsistent shape following an LSTM fed with variable length inputs

cosmincatalin · April 9, 2020, 7:07pm

I have the following Decoder which is fed with captions of various sizes (captions in a batch are padded to the same length).

The problem is that on the first forward pass, the linear layer locks the dimensions of the input, which don’t match those belonging to the same batch and onward. throwing me: Error in operator dense1_fwd: Shape inconsistent, Provided = [9956,9728], inferred shape=(9956,9216).

What are the possible solutions that I can employ to overcome this issue, preferably without changing much in the network design.

class DecoderRNN(HybridBlock):

    @overrides
    def hybrid_forward(self, F, features, captions, *args, **kwargs):
        embeddings = self.embed(captions)
        features_and_embeddings = F.concat(features.expand_dims(axis=1), embeddings, dim=1)
        output = self.lstm(features_and_embeddings)
        result = self.linear(output)
        return result

    def __init__(self, embed_size: int, hidden_size: int, vocab_size: int, num_layers: int):
        super(DecoderRNN, self).__init__()
        self.embed = Embedding(input_dim=vocab_size, output_dim=embed_size)
        self.lstm = LSTM(hidden_size, num_layers, layout="NTC")
        self.linear = Dense(vocab_size, flatten=True)

    @overrides
    def initialize(self, **kwargs):
        self.embed.initialize(**kwargs)
        self.lstm.initialize(**kwargs)
        self.linear.initialize(**kwargs)

keerthanvasist · April 21, 2020, 7:36am

Hello!

I believe when flatten is set to true on a Dense block, the expectation is that input will be of the form NC where N can be variable and C must be fixed. From the error message Shape inconsistent, Provided = [9956,9728], inferred shape=(9956,9216) you have posted, it looks like your batch size has remained constant but your channel size has changed.

So, to make sure that your channel size remains the same, you must use flatten=false in the Dense layer. That should solve your problem.

cosmincatalin · April 22, 2020, 8:58am

That worked, superb.

VishaalKapoor · April 25, 2020, 5:24am

The suggestion on setting flatten=False will correct your scenario, but it’s not the channel length that varies, but the sequence length, T, that varies between batches. Looking at the two dimensions, 9278 and 9216, it’s likely your hidden_size is the gcd, which is 512 in this case. This means one batch has padded sequence length of 18, and the other has padded sequence length of 19. Setting flatten=False will allow for the varying padded sequence length as @keerthanvasist mentions.

Trivia: There’s an approximate 60.79…% chance that the hidden_size is 512 given the information above. It’s the probability that the two sequence lengths a, and b are relatively prime with is 6/\pi^2. Probability that gcd(n, m) = 1. Cheers.

Vishaal

Topic		Replies	Views
LSTM shape error Discussion	1	743	December 28, 2018
How to make LSTM handle with images of different sizes? Gluon	10	1272	July 26, 2018
LSTM/GRU Output Shape Error Discussion	0	440	April 14, 2020
Dense layer shape in LeNet D2L Book	0	286	May 1, 2020
Fixed error	2	385	December 4, 2018

Inconsistent shape following an LSTM fed with variable length inputs

Related Topics