Language models and BeamSearch/SeqGenaration

I have been trying to use BeamSearch and/or Sequence Generation with the language models such as standard_lstm_lm_200. I have tried everything saving the weights leads to error initialized on gpu. Even tried going straight from training to immediately using Beam/Seq and I receive an assertion error regarding children and rnn_conv_cell. It seems related to the fact that the LM models require the context be in a list even if its just 1 gpu while the Beam/Seq produce an error if its in a list. Can they be used together ? It would be nice to know this much. If so a simple clue would be nice if not that would help to, so at least I don’t waste time trying to find a way. I have even read the model files from the model zoo.

Hi @hskramer, can you share your code so that I might be able to help you?

here is snippet for language generation:

import gluonnlp as nlp
import mxnet as mx
from mxnet import gluon, nd
import numpy as np

ctx = mx.gpu() if mx.context.num_gpus() > 0 else mx.cpu()
lm_model, vocab = nlp.model.get_model(name='big_rnn_lm_2048_512',
                                      dataset_name='gbw',
                                      pretrained=True,
                                      ctx=ctx)
class Decoder:
    def __init__(self, model):
        self.model = model
    
    def __call__(self, inputs, states):
        outputs, states = self.model(mx.nd.expand_dims(inputs, axis=0), states)
        return outputs[0], states

decoder = Decoder(lm_model)

sampler = nlp.model.SequenceSampler(beam_size=1,
                                    decoder=decoder,
                                    eos_id=vocab['<eos>'],
                                    max_length=30,
                                    temperature=0.30)

prompt = "Kim Jong Il has been"
prompt_tokenized = prompt.split(' ')
bos_ids = [vocab[ele] for ele in prompt_tokenized]

init_states = lm_model.begin_state(batch_size=1, ctx=ctx)
_, sampling_states = lm_model(mx.nd.expand_dims(mx.nd.array(bos_ids[:-1], ctx=ctx), axis=1), init_states)

inputs = mx.nd.full(shape=(1,), ctx=ctx, val=bos_ids[-1])

for i in range(20):
    samples, _, valid_lengths = sampler(inputs, sampling_states)
    for sample, valid_length in zip(samples[0].asnumpy(), valid_lengths[0].asnumpy()):
        sentence = prompt_tokenized[:-1] +[vocab.idx_to_token[ele] for ele in sample[:valid_length]]
        print(' '.join(sentence))

Kim Jong Il has been named the first woman to head a government that has been under fire for more than a year . <eos>
Kim Jong Il has been named as the successor to Kim Jong Il , the state-run news agency reported . <eos>
Kim Jong Il has been ruled out of the race . <eos>
Kim Jong Il has been the target of a barrage of criticism from the media . <eos>
Kim Jong Il has been invited to attend the opening ceremony of the Beijing Olympics . <eos>
Kim Jong Il has been chosen to replace Kim Jong Il . <eos>
Kim Jong Il has been named the country 's most influential man . <eos>
Kim Jong Il has been a guest of honor for the country 's first lady . <eos>
Kim Jong Il has been named the country 's most powerful politician . <eos>
Kim Jong Il has been named the country 's most influential politician . <eos>
Kim Jong Il has been named the top female official in the country . <eos>
Kim Jong Il has been a top contender for the presidency , the newspaper said . <eos>
Kim Jong Il has been a regular visitor to the country . <eos>
Kim Jong Il has been named the most influential person in the country . <eos>
Kim Jong Il has been named the country 's first female president . <eos>
Kim Jong Il has been named the top official in the country 's political system . <eos>
Kim Jong Il has been given a rare chance to shine . <eos>
Kim Jong Il has been seen in the media . <eos>
Kim Jong Il has been named the country 's most popular politician . <eos>
Kim Jong Il has been a guest at the White House , a spokeswoman said . <eos>

It comes straight out of the tutorial on LM models when you get to the end it has using your own dataset after training with my own I would like to produce some text based on the my own dataset. Simplest way to put it is how after training on sherlockholmes at the end of the tutorial how would you produce text based on sherlockholmes using one of the methods from my original post. I tried what you posted and it worked I know the flaw in my approach these models require very large amounts of data and lots of training. What I need to research now is a next word/character rnn predicter like the one in d2l ch8, but with a better embedding that can write in the style of a given author. I still like using what you posted its more interesting its makes perplexity come alive. I like being able to make math accessible to everybody.