HW8.3 Clarification

Benson_Yuan · April 13, 2019, 8:05am

Can the course staff please clarify what this question means or point to some related resources/examples? Thank you!

gold_piggy · April 14, 2019, 3:39am

We use part of the original string to predict what comes next. e.g. after "But B", it comes "r", after "ut Br", it comes "u". We are preparing for training data here for the model to learn what comes after certain characters. The above is 5 gram, which take 5 characters sliding window for each X.

Benson_Yuan · April 14, 2019, 3:44am

Regarding sequential encoding, do we always have a sequence of 5?
From my understanding, for " * Use a bag of characters encoding that sums over all occurrences.", we simply turned the 5 characters into a one-hot encoded vector. So for sequential encoding, can we just retain it as a matrix of shape (vocab size, 5) as the input where ith column is the one-hot vector of the ith character?

Edit: It seems to work

gold_piggy · April 14, 2019, 6:14pm

That’s an interesting approach, not sure how well does it work. My understanding is to sum the 5-character encoding to a vector rather than matrix.

Yeah, 5-gram is fine.

Benson_Yuan · April 14, 2019, 6:17pm

I thought the question mentioned to use two models.

In one case use a sequential encoding to obtain an embedding proportional to the length of the sequence. (each example is a matrix)
Use a bag of characters encoding that sums over all occurrences. (each example is a vector)

And the result is consistent with our intuition in which one should work significantly better than the other.

gold_piggy · April 15, 2019, 4:06am

That sounds correct to me!

annashang · April 16, 2019, 3:55am

If that’s the case, we would lose sequential information when we turn the matrix into a vector. E.g. “aab” would have same one-hot encoding as “baa” and “aba”.

gold_piggy · April 16, 2019, 4:34am

Yep, the second case will lead to this kind of information loss.

jiaqi_arianna_guo · April 16, 2019, 7:46am

Which one should we as the feature matrix when training MLP?

Benson_Yuan · April 16, 2019, 7:48am

The question mentioned both cases.

ryantheisen · April 16, 2019, 6:09pm

Yes, train using both.

Topic		Replies	Views
HW 8.3.2 Faster way to obtain embeddings? Courses	2	630	April 16, 2019
Softmax Regression D2L Book	10	1594	April 17, 2020
HW9 2.2 Missing Values Courses	5	592	April 23, 2019
Batch reshaping produces different results from net	0	374	September 9, 2018
Homework Q1 Courses	14	793	January 29, 2019

HW8.3 Clarification

Related Topics