How to use hybridization for Language modelling (RNN)

A-Sharma · May 9, 2018, 8:28am

I’ll keep it light~
Using an RNN layer like LSTM or GRU gives the ‘non-hybrid children’ error

How to overcome it to create a RNN Language model capable of Hybridization ?

Thanks

safrooze · May 9, 2018, 10:06pm

Currently RNN blocks are not hybridizable. This is due to the fact that the fused RNN is not implemented for CPU. The work in progress is happening here (https://github.com/apache/incubator-mxnet/pull/10104) and here (https://github.com/apache/incubator-mxnet/pull/10311)

A-Sharma · May 16, 2018, 7:41am

hi,thanks for the answer, but I had a query(this may sound really stupid but-) , if we were to train the RNN layer on GPUs , will its hybridization work ?

Or is there a way to make a language model (using rnn cells) that is hybridizable

safrooze · May 16, 2018, 1:46pm

Yes you can certainly create a hybrizable RNN by chaining cells together. However the performance is going to be significantly worse than the fusedRNN operator. I have implemented a hybridizable LSTM block that uses LSTM cells, If you really need hybridization, I can share the code.

Sergey · May 16, 2018, 5:21pm

In case there is a misunderstanding, I want to clarify one thing (sorry, if you already know it).

Hybridization and working on GPU are not related. You can still train your model on GPU and gain from all its glory without hybridization.

Non-hybridazable version is still slower than hybridazable one, but the effect is way less than the difference between executing your model on CPU vs. GPU.

A-Sharma · May 17, 2018, 2:11am

Thanks for the response. So once the fusedRNN operator for CPU project is complete (https://github.com/apache/incubator-mxnet/pull/10104) , one can also run it on the GPU ?

The thing that is slightly confusing to me is the ‘for CPU’ part

I am pretty new to this so thank you for all the help

A-Sharma · May 17, 2018, 2:16am

I see. Thanks for clearing that out! And thanks for offer to share the code. but I would like to give a try for it myself first

if I end up failing , I hope the offer would still stand

A-Sharma · May 17, 2018, 7:43am

well I gave it a go and I doubt my attempt is close to a solution
I would be really appreciate if you could share that code

safrooze · May 17, 2018, 8:47pm

There you go. Keep in mind that even though this block is hybridizable, it is significantly less efficient than gluon.rnn.LSTM on GPU. So I would only use this if you have to have an end-to-end hybridizable network (for example you want to save your model in json format and run it in C++). Either way, I’d recommend waiting for the hybridizable version of gluon.rnn.LSTM to be released.

class LstmHybrid(gluon.HybridBlock):
    def __init__(self, hidden_dim, seq_len, layout='NTC'):
        """
        :param int hidden_dim: hidden size of the LSTM
        :param int seq_len: sequence length of the unrolled LSTM
        :param int batch_size: optional batch size
        :param str layout: valid options: NTC, TNC, or NCT
        """
        super(LstmHybrid, self).__init__()

        with self.name_scope():
            # T=sequence_length, N=batch_size, C=feature dimension
            self.seq_len = seq_len
            self.layout = layout
            self.lstmcell = gluon.rnn.LSTMCell(hidden_size=hidden_dim)
            self.begin_state_h = self.params.get(
                'begin_state_h', shape=(0, hidden_dim), init='zeros', allow_deferred_init=True)
            self.begin_state_c = self.params.get(
                'begin_state_c', shape=(0, hidden_dim), init='zeros', allow_deferred_init=True)

    def hybrid_forward(self, F, x, begin_state_c, begin_state_h):
        """
        :param mx.nd or mx.sym F: type
        :param mx.NDArray or mx.Symbol x: data in correct layout (N must be before C)
        :param mx.NDArray or mx.Symbol begin_state_c: begin cell state parameter
        :param mx.NDArray or mx.Symbol begin_state_h: begin hidden state parameter
        :return:
        """
        t_axis = self.layout.index('T')
        states = [begin_state_c, begin_state_h]
        outputs = []
        x = F.split(x, self.seq_len, axis=t_axis)
        for i in range(self.seq_len):
            output, states = self.lstmcell(F.squeeze(x[i], axis=t_axis), states)
            outputs.append(output)
        return F.stack(*outputs, axis=t_axis)

safrooze · January 28, 2019, 8:02pm

Just to add to the above response, Gluon LSTM blocks are now fully hybridizable and the above solution is no longer necessary.

sharmalakshay93 · May 21, 2019, 6:22pm

Hi @safrooze, I’m unable to load a HybridSequential model that contains gluon.rnn.LSTM layers (details here). Any clues about what I might be doing incorrectly? Thanks!

Topic		Replies	Views
What makes the RNN layer unable to hybridize? Gluon	1	588	May 16, 2018
Model Parallelism with Hybrid Blocks Gluon	3	779	December 23, 2019
Problems with Hybridize Gluon	1	541	May 14, 2018
RNN implementation difference between Gluon and Symbolic Gluon	3	914	March 12, 2018
Problem when hybridizing with sparse dot Gluon	3	785	September 6, 2018

How to use hybridization for Language modelling (RNN)

Related Topics