http://d2l.ai/chapter_multilayer-perceptrons/environment.html

“For this scheme to work, we need that each data point in the target (test time) distribution had nonzero probability of occurring at training time. If we find a point where q(x)>0 but p(x)=0, then the corresponding importance weight should be infinity.”

Isn’t it the other way around?

q(x)=0 will cause the \beta \to \inf

I think there is a typo in formula 4.9.2, where the denominator should be q as well.

\int p(\mathbf{x}) f(\mathbf{x}) dx & = \int p(\mathbf{x}) f(\mathbf{x}) \frac{q(\mathbf{x})}{q(\mathbf{x})} dx

At the end of section 4.9.1.5 “Covariate Shift Correction” it is stated that the correction factor is infinity for p(x)=0 and q(x)>0. This conflicts with the definition of beta(x)=p(x)/q(x) (following equation 4.9.2). Should q(x) and p(x) be switched?

Can someone explain " When the distribution of labels shifts over time 𝑝(𝑦)≠𝑞(𝑦)p(y)≠q(y) but the class-conditional distributions stay the same 𝑝(𝐱)=𝑞(𝐱)p(x)=q(x), our importance weights will correspond to the label likelihood ratios 𝑞(𝑦)/𝑝(𝑦)q(y)/p(y)."

what is the connection here?

Sorry for the late reply.

In the context of “covariate shift correction”, since we cannot draw date from (the ideal) source distribution p(𝐱), we have to simulate from the target distribution q(x). Hence, we always have data from q(x), i.e., q(x) > 0.

The point we want to make here is that: to train the “covariate shift corrector”, we have to include some data from the source distribution p(𝐱), i.e., p(x) > 0.

If p(x)=0, then P(z=1 \mid \mathbf{x}) = \frac{p(\mathbf{x})}{p(\mathbf{x})+q(\mathbf{x})} = 0, then we cannot train the logistic regression model (the corresponding importance weight should be infinity).