Environment

mli · November 27, 2018, 10:31pm

http://d2l.ai/chapter_multilayer-perceptrons/environment.html

macussair · September 19, 2019, 1:10pm

“For this scheme to work, we need that each data point in the target (test time) distribution had nonzero probability of occurring at training time. If we find a point where q(x)>0 but p(x)=0, then the corresponding importance weight should be infinity.”
Isn’t it the other way around?
q(x)=0 will cause the \beta \to \inf

Siyang · November 20, 2019, 8:45pm

I think there is a typo in formula 4.9.2, where the denominator should be q as well.

\int p(\mathbf{x}) f(\mathbf{x}) dx & = \int p(\mathbf{x}) f(\mathbf{x}) \frac{q(\mathbf{x})}{q(\mathbf{x})} dx

gold_piggy · November 26, 2019, 7:52pm

Hi @Siyang, great catch! Thanks!

jaschoepfer · November 28, 2019, 2:43pm

At the end of section 4.9.1.5 “Covariate Shift Correction” it is stated that the correction factor is infinity for p(x)=0 and q(x)>0. This conflicts with the definition of beta(x)=p(x)/q(x) (following equation 4.9.2). Should q(x) and p(x) be switched?

chased · January 5, 2020, 3:59am

Can someone explain " When the distribution of labels shifts over time 𝑝(𝑦)≠𝑞(𝑦)p(y)≠q(y) but the class-conditional distributions stay the same 𝑝(𝐱)=𝑞(𝐱)p(x)=q(x), our importance weights will correspond to the label likelihood ratios 𝑞(𝑦)/𝑝(𝑦)q(y)/p(y)."

what is the connection here?

chased · January 5, 2020, 6:00am

Just found this video to clear the confusion https://www.youtube.com/watch?v=nAqQF-jU_YM

gold_piggy · January 12, 2020, 11:37pm

Sorry for the late reply.

In the context of “covariate shift correction”, since we cannot draw date from (the ideal) source distribution p(𝐱), we have to simulate from the target distribution q(x). Hence, we always have data from q(x), i.e., q(x) > 0.

The point we want to make here is that: to train the “covariate shift corrector”, we have to include some data from the source distribution p(𝐱), i.e., p(x) > 0.

If p(x)=0, then P(z=1 \mid \mathbf{x}) = \frac{p(\mathbf{x})}{p(\mathbf{x})+q(\mathbf{x})} = 0, then we cannot train the logistic regression model (the corresponding importance weight should be infinity).

Topic		Replies	Views
Softmax Regression D2L Book	10	1535	April 17, 2020
Hw5 Q1 Binary class 1 and 0	0	263	February 26, 2019
HW5 Question 3 Courses	4	534	February 27, 2019
Factorization Machines D2L Book	1	528	January 8, 2020
How to weight terms in softmax cross entropy loss based on value of class label Discussion	3	8177	December 12, 2017

Environment

Related Topics