KFAC, KFRA and all more fancier "backward" passes

botev · November 8, 2017, 10:26am

Hi,
So KFAC and KFRA are quite more sophisticated optimization methods for neural networks. They use a Kronecker-Factored approximation for each block of the Gauss-Newton. However, they require essentially a very specialized extra backward pass. I was wondering if anyone could give advice on what would be the most appropriate way of actually doing this on the Block levels. Additionally, they require matrix inverses I was wondering if the corresponding functions (e.g. potrf and trsm) have also been linked to work on the GPU via cublas/cusolve.

piiswrong · November 11, 2017, 12:10am

lapack functions should work on GPU now.

To modify the backward pass, consider using autograd.Function

Topic		Replies	Views
Gradients through sparse.dot	3	511	July 2, 2018
How to compute higher order gradients	1	913	July 3, 2018
WGAN-gp: can't compute gradient penalty with gluon? Gluon	0	411	October 15, 2020
Aggregate gradients manually over n batches Gluon	26	6618	July 2, 2020
How to consolidate the weight matrix from model parallelism?	2	573	December 1, 2017

KFAC, KFRA and all more fancier "backward" passes

Related Topics