GANs - historical averaging function

feevos · April 23, 2018, 6:02am

Dear all,

I was going through the Improved Techniques for training GANs, can anyone please suggest how to implement efficiently the historical averaging (Section 3.3)? This is an additional loss term of the form

additional_loss = ||\theta - (1/t) \sum_{i=1}^t \theta[i]||

where \theta are the generator weights/bias. The first thing that comes up into my mind is that I need to create an identical copy “network” that will store all parameters (with grad_req='null') and be used as a “history recorder”. Then define a smoothing function (like exponential smoothing) that will take as input the parameters of the two networks, and will result the smothed out version.

Thank you in advance.

thomelane · April 24, 2018, 12:33am

Hi @feevos, I don’t think you can avoid holding another copy of the weights here! Although it depends on your model, the model weights usually don’t take up the largest share of your memory, it’s the feature maps that do, so this hopefully isn’t a concern.

As an alternative to creating another model, you might want to work with the ParameterDicts returned from net.collect_params(). Initially take a copy, and then update each of the parameters these using (param_avg*iteration+param_cur)/(iteration+1) on each iteration.

feevos · April 24, 2018, 3:28am

Thank you very much @thomelane, I’ll go with your suggestion.

Topic		Replies	Views
WGAN-gp: can't compute gradient penalty with gluon? Gluon	0	408	October 15, 2020
Gradient nan when using 2-norm in lstm network Gluon	0	393	August 16, 2019
Multiple losses Gluon	7	3654	June 5, 2018
How to implement the addtion of grad in the backback-propagating,how to add extra term (which is the gradient to middle net layer output) to the network	2	588	August 18, 2018
Obtaining second order derivatives for a function wrt arbitrary parameters in the computation graph	6	1762	June 28, 2019

GANs - historical averaging function

Related Topics