Naive Bayes Classification

mli · November 27, 2018, 6:48pm

http://d2l.ai/chapter_crashcourse/naive-bayes.html

vermicelli · January 13, 2019, 4:04pm

In definition of bayespost(data) function, the x in logpost += (logpx * x + logpxneg * (1-x)).sum(0) should be data.

prasanth5reddy · March 5, 2019, 1:40am

There are certain terms like softmax which I feel are unknown to a beginner. Is this a concept which will be covered later or is there any resource where we can learn about this?

dsz6223042 · March 23, 2019, 7:15am

I got confused about the meaning of the notations P(x), P(y), P(x|y), etc.

mli · June 6, 2019, 2:48am

To all, this chapter is rewritten to be more beginner friendly. (you may need a force fresh in case there is a cached version).

harrysun23w · June 7, 2019, 12:04pm

I don’t really get it in Section 2.5.4 for Bayes prediction:

def bayes_pred(x):
    x = x.expand_dims(axis=0)  # (28, 28) -> (1, 28, 28)
    p_xy = P_xy * x + (1-P_xy)*(1-x)
    p_xy = p_xy.reshape((10,-1)).prod(axis=1) # p(x|y)
    return p_xy * P_y

What is line 3 doing and how we are getting value of p(x|y)?

Pang_Luo · June 8, 2019, 7:19am

@harrysun23w
The original shape of p_xy is (10, 28, 28), so line 3 reshapes p_xy into (10, 784), and then does multiplication for all the 784 probabilities for each class.

Pang_Luo · June 8, 2019, 9:39am

If we can estimate \prod_i p(x_i=1 | y) for every i and y, and save its value in P_{xy}[i,y], here P_{xy} is a d\times n matrix with n being the number of classes and y\in{1,\ldots,n}.

It seems that \prod_i p(x_i=1 | y) should be p(x_i=1 | y) instead.

Pang_Luo · June 8, 2019, 9:50am

we could compute \hat{y} = \operatorname*{argmax}_y \prod_{i=1}^d P_{xy}[x_i, y]P_y[y], (2.5.5)

this equation seems incorrect. Probably it could be like,

\hat{y} = \operatorname*{argmax}_y \prod_{i=1}^d (x_iP_{xy}[i, y] + (1 - x_i)(1 - P_{xy}[i, y]))P_y[y]

mru4913 · July 5, 2019, 6:10am

I do not get this equation here.

p_xy = P_xy * x + (1-P_xy)*(1-x)

which is not explained in the context.

Yayun · July 11, 2019, 7:57pm

Since x_i can only be 1 or 0, we should have

If P_{xy}[i, y] represents p(x_i = 1| y), we have

\hat{y} = \operatorname*{argmax}_y \> \prod_{i=1}^d (P_{xy}[i, y]x_i + (1-P_{xy}[i,y])(1-x_i))P_y[y],

For log case,

\hat{y} = \operatorname*{argmax}_y \> \sum_{i=1}^d (\log P_{xy}[i, y] x_i + \log (1 - P_{xy}[i, y])(1-x_i) + \log P_y[y].

mru4913 · July 12, 2019, 12:50am

Hi Yayun, thank you for your reply. Basically it is just a mathematical transformation, right?

Yayun · July 12, 2019, 2:25pm

Hi mru4913, you are welcome. I think it is. Just keep in mind that our goal is to find what the value of p(x_i | y) is. I was confused at the first time. But p(x_i = 1| y) reminded me that x_i could also be 0, and then I got the key that we need to calculate p(x_i = 0 | y).

vajrangi · August 1, 2019, 7:08pm

What is the answer of 3rd question of exercise ?

mcdhee · August 8, 2019, 1:53pm

what is delta in this case?

mcdhee · August 8, 2019, 2:41pm

n_x[y] = nd.array(X.asnumpy()[Y==y].sum(axis=0)).
In this line why does one have to convert X to numpy and then index it, why not directly index it like X[Y==y] ?

sadreamer · October 21, 2019, 10:57pm

I think it is a bug. To be consistent with the code snippet later which is used to demo the trick of avoiding underflow and overflow, the code here should be

p_xy = P_xy ** x + (1-P_xy)**(1-x)

PaoloZhang · November 6, 2019, 9:53am

Yes, I think so.

\prod_i p(x_i=1 | y)

should be

p(x_i=1 | y)

instead.

Topic		Replies	Views
Naive Bayes D2L Book	3	558	July 5, 2020
Softmax Regression D2L Book	10	1524	April 17, 2020
Factorization Machines D2L Book	1	524	January 8, 2020
Multilayer Perceptron D2L Book	4	1118	March 24, 2020
Bayes by Backprop with gluon jupyter notebook Gluon	1	578	October 16, 2017

Naive Bayes Classification

Related Topics