Hi I need to implement a fully connected linear layer in symbolic mode, where the weights are restricted to lie on a simplex, i.e,. each row of the weights vector is a probability distribution (the entries of each row are all non-negative and they sum to one). One way that I could think of imposing this restriction is to use softmax
on weights as follows:
y = mx.sym.dot(x, mx.sym.transpose(mx.sym.softmax(W)))
,
where x
is the input symbol and y
is the output symbol, and W
is the weight matrix.
However, it’s not clear to me how to tell mxnet
that the W
is not a symbol, but a parameter matrix that needs to be learned during training.
Can anyone give me pointers on how to set W
in the above equation to be a parameter matrix?