Softmax Regression

mli · November 27, 2018, 10:28pm

https://d2l.ai/chapter_linear-networks/softmax-regression.html

rlg · March 4, 2019, 8:21pm

In the topic Log-Likelihood and others, what does ‘n’ represent?

louzhao8712 · June 12, 2019, 4:48am

n presents number of observation

mcdhee · August 12, 2019, 10:07am

Can somebody explain how -logp(y/x) = -sigma(y * log(y))

macussair · September 8, 2019, 2:14pm

Here is my humble understanding:

note that \acute{y}_j== p(y == j|x)
and y is one hot code (all zero except one)
thus
- \sum ( y_j * log ( \acute{y}_j ) ) = - log (\acute{y}_y ）= -log p(y|x)

poor english sorry

mlrocks · January 7, 2020, 7:45am

Can someone please explain this part in Question4?

Assume that we three classes which occur with equal probability, i.e., the probability vector is (13,13,13)
What is the problem if we try to design a binary code for it? Can we match the entropy lower bound on the number of bits?

What does it mean when we say entropy lower bound?

yoyoyoohh · February 23, 2020, 8:16am

Hi, I was wondering is there any standard answer to the question2 and 3? The question 2 and 3 are as follows：

ZenPylon · March 15, 2020, 7:19pm

Just under equation 3.4.4, the equation “𝐨(𝑖)=𝐖𝐱(𝑖)+𝐛 where 𝐲̂ (𝑖)” is listed.

Should b also have a superscript (i.e. if I understand correctly, there is a separate bias for each output neuron)? This appears to be the case in equation 3.4.2.

gold_piggy · March 31, 2020, 7:55pm

Hi @mlrocks, please check https://d2l.ai/chapter_appendix-mathematics-for-deep-learning/information-theory.html#properties-of-entropy to see the lower bound’s meaning.

Mr1159pm · April 11, 2020, 7:27pm

In the equation 3.4.5: If XW results in an n by q matrix and b is a vector of size q, how does it get added to the matrix? In the preliminaries section, the book states that “column vectors to be the default orientation of vectors”. Should i just assume that in this case the vector is a row vector and it gets added to each row of the XW matrix?
I apologize if it’s a trivial question , but want to make sure that i get this right.

sanjaradylov · April 17, 2020, 9:48am

Yes, your assumption is correct. We have \mathbf{X} \in \mathbb{R}^{n \times d}, \mathbf{W} \in \mathbb{R}^{d \times q}, and \mathbf{b} \in \mathbb{R}^{1 \times q}. When we perform \mathbf{X}\mathbf{W} + \mathbf{b} in numpy, \mathbf{b} as a row vector is copied n times to get \mathbf{B} := [\mathbf{b}, \ldots, \mathbf{b}]^T \in \mathbb{R}^{n \times q}.

Topic		Replies	Views
Cross entropy loss function in Softmax regression D2L Book	3	649	April 26, 2020
Naive Bayes Classification D2L Book	17	2013	November 6, 2019
Multilayer Perceptron D2L Book	4	1118	March 24, 2020
Environment D2L Book	7	805	January 12, 2020
Calculus D2L Book	13	2653	June 16, 2020

Softmax Regression

Related Topics