@gold_piggy: reminder to finalize the weighting proportions.

In our example, the truth distribution is 50% shirts and 50% shoes, while the train distribution is 10% shirts and 90% shoes. As a results, the model trained on training set will classify more shoes than shirts. Thatâ€™s why we need to adjust the probability distribution by the following formulas.

[Background.] See we have

x_i^{'} \sim q(x), n_q samples in test set (truth distribution)

x_i \sim p(x), n_p samples in training set

To train a classier f_c for distinguish train and truth distribution, we define a â€śclassifier training setâ€ť C = \{(a_i, b_i), ... \}, which is the combination of training and testing set, and it has n_c = (n_p + n_q) samples, and

b_i = -1 , if a_i \sim p(x), i.e. draw from training set;

b_i = 1 , if a_i \sim q(x), i.e. draw from test set.

As b_i is a binary classifier,

p(b_i=1 | a_i) = \frac{q(x) w_q}{p(x) w_p + q(x) w_q} \text{ where } w_p = \frac{n_p}{n_p + n_q}, w_q = \frac{n_q}{n_p + n_q}

and hence,

\frac{Pr(b_i=1|a_i)}{Pr(b_i=-1|a_i)} = \frac{q(x) n_q}{p(x) n_p}

Also, by HW3, we know

\frac{Pr (b_i = 1 | a_i)}{Pr (b_i = -1 | a_i)} = exp(f_{c}(x)),

Hence, by above two functions, **for unbalanced train and test sets**:

\frac{q(x)}{p(x)} = \frac{n_p}{n_q } exp(f_{c}(x))

i.e. the classifier will have more weights on q(x) if n_p > n_q.

Now back to our covariance shift problem, we can approximately the function by the classifier:

\int q(x) f(x) dx = \int p(x) \alpha(x) f(x) dx, where \alpha (x) = \frac{q(x)}{p(x)} = \frac{n_p}{n_q } exp(f_{c}(x))

i.e. for each sample x, we can approximate its probability by p(x) \alpha(x), where p(x) is from train distribution and \alpha(x) calculated by above classifier f_c,

or in the discrete cases,

\sum_i loss(y_i, \hat{y_i}) â€”> \sum_i loss(y_i, \hat{y_i}) * exp(f_c(x_i))

Note the weighted factors can be implemented as ** sample_weight** parameters in

`SigmoidBinaryCrossEntropyLoss`

.To clarifyâ€¦ does $$f_c(x_i)$$ take the value of 1 or -1 or the original output of the data classifier that might be any float value?