Xavier Initialization PyTorch vs MxNet

Hello Everyone,

This is a comparison of the MxNet Xavier Initialization and the PyTorch one.

This was first published to the PyTorch forum
https://discuss.pytorch.org/t/xavier-initialization-pytorch-vs-mxnet/71451
(excuse the formatting of the link, new users can only post 2 links in a question…)
but as it involves both frameworks, it would need expertise from both sides.

I am porting an MxNet paper implementation to PyTorch

mx.init.Xavier(rnd_type="uniform", factor_type="avg", magnitude=0.0003)

and

torch.nn.init.xavier_uniform_(array, gain=0.0003) 

Should be pretty much the same, right?

But the docs and source code show another “definition” of magnitude and gain

Even when scaling gain and magnitude correctly, I am still getting different ranges of numbers.

Both starting from an empty array and initializing it.

The image can show the docs of both PyTorch and MxNet.

Am I missing something?
How can I make sure that both PyTorch and MxNet functions are initializing a specific input array in the same way?