Weight Decay

mli · November 27, 2018, 10:30pm

http://d2l.ai/chapter_multilayer-perceptrons/weight-decay.html

Pang_Luo · June 14, 2019, 11:28am

In train(lambd) as well as train_gluon(wd), animator.add(epoch+1, …) should be changed into animator.add(epoch, …) because epoch starts from 1 in the for loop.

macussair · September 11, 2019, 12:36pm

In 4.5.1, the stochastic gradient descent updates is a bit strange, shoudn’t the decay rate of w controlled by \lambda alone, why is batch size involved here?

yoyoyoohh · March 15, 2020, 12:27pm

I’d like to point out thtat There are a typos, as circled out by red line in the right bottom area of the screenshot below:

Topic		Replies	Views
Multiple weight decay rates Gluon	4	775	January 4, 2019
Aggregate gradients manually over n batches Gluon	26	6618	July 2, 2020
Does convolution layer has weight decay param? just as fc layer "wd_mult" Discussion	1	455	July 3, 2018
Adagrad D2L Book	3	753	November 12, 2019
Implementation of weighted softmax by extending mx.autograd.Function fails	2	652	September 2, 2019

Weight Decay

Related Topics