Xavier Initialization PyTorch vs MxNet

jean-marc · February 29, 2020, 2:17pm

Hello Everyone,

This is a comparison of the MxNet Xavier Initialization and the PyTorch one.

This was first published to the PyTorch forum
https://discuss.pytorch.org/t/xavier-initialization-pytorch-vs-mxnet/71451
(excuse the formatting of the link, new users can only post 2 links in a question…)
but as it involves both frameworks, it would need expertise from both sides.

I am porting an MxNet paper implementation to PyTorch

mx.init.Xavier(rnd_type="uniform", factor_type="avg", magnitude=0.0003)

and

torch.nn.init.xavier_uniform_(array, gain=0.0003)

Should be pretty much the same, right?

But the docs and source code show another “definition” of magnitude and gain

Even when scaling gain and magnitude correctly, I am still getting different ranges of numbers.

Both starting from an empty array and initializing it.

The image can show the docs of both PyTorch and MxNet.

Am I missing something?
How can I make sure that both PyTorch and MxNet functions are initializing a specific input array in the same way?

Topic		Replies	Views
mxnet.initializer.Xavier Discussion	0	314	March 18, 2022
Pytorch -> mxnet: much lower recall with the same hyperparameters Discussion	0	434	December 1, 2019
How to transfer res2net from pytorch to mxnet? Discussion	0	432	February 13, 2020
Scorch: a minimalist neural net framework in Scala inspired by PyTorch Discussion	2	1000	December 19, 2018
Torch.nn.utils.weight_norm equivalence?	2	683	May 14, 2018

Xavier Initialization PyTorch vs MxNet

Related Topics