Numerical Stability and Initialization

mli · November 27, 2018, 10:31pm

http://d2l.ai/chapter_multilayer-perceptrons/numerical-stability-and-init.html

Pang_Luo · June 16, 2019, 1:14am

\begin{aligned} {E}[h_i^2] & = \sum_{j=1}^{n_\mathrm{in}} \mathbf{E}[W^2_{ij} x^2_j] \ \end{aligned}

I’ve found the second equation of 4.8.4 hard to understand. Let’s say h_i = w1 * x1 + w2 * x2, then E[h_i^2] = E(w1^2 * x1^2 + w2^2 * x2^2 + 2w1 * x1 * w2 * x2]. The above equation simply abandons 2w1 * x1 * w2 * x2?

andrea_api · August 10, 2019, 3:03pm

all clear, but a thing:
why Xavier Initialization should avoid problems as exploding/vanishing gradient? It seems just a method to give, in a first time, the same input variance to the model parameters, I don’t understand why and how is correlated with the exposed problems…

spanev · August 11, 2019, 11:20pm

It comes from how your parameters map your input/features on the activation function.
The goal of this initialization is to keep zero mean/unit variance of the logit before the activation, to avoid vanishing gradients.

Please find a more detailed explanation in this answer https://www.quora.com/What-is-an-intuitive-explanation-of-the-Xavier-Initialization-for-Deep-Neural-Networks

Hope that this answers your question

andrea_api · August 12, 2019, 10:51am

Very interesting and clear! Thank you

Topic		Replies	Views
Implementation of a Recurrent Neural Network from Scratch D2L Book	3	939	April 17, 2020
Why deferred initialization? Gluon	3	1722	August 29, 2018
Calculus D2L Book	13	2665	June 16, 2020
Multilayer Perceptron D2L Book	4	1126	March 24, 2020
Softmax Regression D2L Book	10	1534	April 17, 2020

Numerical Stability and Initialization

Related Topics