I am following the steps in
https://gluon-nlp.mxnet.io/examples/word_embedding/word_embedding_training.html
which returns similar results until the following step:
example_token = “vector”
get_k_closest_tokens(vocab, embedding, 10, example_token)
which does not return similar tokens to “vector”.
closest tokens to “vector”: is, in, zero, a, one, two, of, the, and, to
It appears that word vector for the example token are all zeroes and presumable all words in the vocabulary are likewise. I thought that the intialization is based on its ngrams?
When I set the model to train , the result is the same at the end of training