In class Vocab, we sort tokens 1) in decreasing order of counts and 2) lexicographical order of tokens if there are the same number of tokens by these two lines:
self.token_freqs = sorted(counter.items(), key=lambda x: x)
self.token_freqs.sort(key=lambda x: x, reverse=True)
I understand why we need to sort tokens by counts, but I wonder if there is any reason to sort them in lexicographical order. Is there any specific reason?
"The modification we did here is that
corpus is a single list, not a list of token lists, since we do not the sequence information in the following models. " does not make sense, especially “since we do not the” part