Hello Everyone,
This is a comparison of the MxNet Xavier Initialization and the PyTorch one.
This was first published to the PyTorch forum
https://discuss.pytorch.org/t/xavier-initialization-pytorch-vs-mxnet/71451
(excuse the formatting of the link, new users can only post 2 links in a question…)
but as it involves both frameworks, it would need expertise from both sides.
I am porting an MxNet paper implementation to PyTorch
mx.init.Xavier(rnd_type="uniform", factor_type="avg", magnitude=0.0003)
and
torch.nn.init.xavier_uniform_(array, gain=0.0003)
Should be pretty much the same, right?
But the docs and source code show another “definition” of magnitude and gain
Even when scaling gain and magnitude correctly, I am still getting different ranges of numbers.
Both starting from an empty array and initializing it.
The image can show the docs of both PyTorch and MxNet.
Am I missing something?
How can I make sure that both PyTorch and MxNet functions are initializing a specific input array in the same way?