HW5 Question 3

kyle · February 26, 2019, 12:51pm

The training set has 12,000 samples whereas the test set has 4,000 samples. Intuitively either the training set should be weighted x1/3 or the test set should be weighted x3. How do I implement this in Gluon? Is using only 4,000 samples from the training set considered a valid method for “re-weighting data”?
The textbook says we need to get function f. How do I obtain this function once I’m done training the classifier? Is it some attribute of the net instance?
It says “Use the scores to compute weights on the training set”. What do you mean by the “scores”? My understanding is that once I have f I can compute \exp(f(x_i)) which is multiplied to loss(net(X), y), so why do these “scores” even matter?
According to the textbook it is better to use \min(\exp(f(x_i)),c). What is c?

ryantheisen · February 26, 2019, 6:57pm

I’m not totally sure I understand your first question, but for 2, f should simply be the output of your network (so call net(x)), and I believe scores just refer to the outputs f(x_i). For the last question, c is just some constant, which you use because you don’t want the loss function to become unbounded, which could be the case when f(x_i) outputs extremely large values. This could happen as training progresses, as the network gets better at separating the classes.

jamesli · February 26, 2019, 7:40pm

i think the first question is asking about the hint in part 2, where it says we need to weigh the data before training the binary classifer

kyle · February 26, 2019, 8:05pm

The first question was about how to weigh the data so that a sample from the test set matters much more than a sample from the training set, since the training set is thrice the size of the test set. My gut tells me that when I compute loss(net(X), y) I need to multiply this by 3 if y is 1 (i.e. it’s from the test set). Is this the right approach?

Also, for question 4, how do I compute c?

ryantheisen · February 27, 2019, 8:57pm

The re-weighting occurs when you train the classifier between the training/test set. From slide 47 in the lecture, we defined the distribution:

r(x,y) = \frac{1}{2}[p(x)\delta(y,1) + q(x)\delta(y,-1)]= \frac{1}{2}p(x)\delta(y,1) + \frac{1}{2}q(x)\delta(y,-1)

where the \frac{1}{2} comes from the assumption that the training and test sets are the same size. If they aren’t the same size, how would you want to re-weight this data distribution?

You can choose c somewhat arbitrarily… but think about what happens when you choose c very large/very small.

Topic		Replies	Views
Hw5 Q3.3	0	257	February 24, 2019
0 test/train accuracy for Q1.4 Courses	2	460	February 26, 2019
Custom loss function from a pre-trained network Discussion	2	834	March 23, 2018
Neural Collaborative Filtering for Personalized Ranking D2L Book	8	857	June 23, 2020
How to weight terms in softmax cross entropy loss based on value of class label Discussion	3	8173	December 12, 2017

HW5 Question 3

Related Topics