Very simple question on input parameters of fullyconnected

QiXuanWang · April 13, 2018, 3:59am

mxnet.symbol.FullyConnected(data=None, weight=None, bias=None, num_hidden=_Null, no_bias=_Null, flatten=_Null, name=None, attr=None, out=None, **kwargs)¶
If flatten is set to be false, then the shapes are:
data: (x1, x2, …, xn, input_dim)
weight: (num_hidden, input_dim)
bias: (num_hidden,)
out: (x1, x2, …, xn, num_hidden)

So what’s the xn and input_dim meaning here? suppose a input image of (28,28,3) with batch 8.

safrooze · April 17, 2018, 3:10am

Basically if flatten is False, FC layer is applied to the last dimension of input array:
FC: input_dim --> num_hidden .

For example if data is (x₁, x₂, …, x_n-1, x_n), the output will be (x₁, x₂, …, x_n-1, num_hidden) and the weight dimensions will be (num_hidden, x_n).

QiXuanWang · April 25, 2018, 3:31am

Thanks. But I still don’t get it.
If (x1, … xn) is data shape, where is batch_size? I see if flatten is True, there is a batch_size dimension. But I don’t understand why it’s missing when flatten is False.

And what is input_dim here? if (x1, … xn) is (28, 28, 1), input_dim would be 3? and output shape would be (x1, … x_n-1, num_hidden)? Isn’t it different from official document since there writes (x1, … xn, num_hidden)? Or it’s typo?

I don’t really understand when flatten is False. For flatten is True, it’s quite simple and you can find documents on other framework to be similar. (Like in TF).

safrooze · April 26, 2018, 12:40am

Batch-size could be any of the dimensions, but normally batch-size is the first dimension of your NDArray. For example in my example, batch-size would be x₁. Think of your FC as a mapping from (dim0, input_dim) to (dim0, output_dim). Now when flatten is True, you can think of FC to do a reshape from (x₁,x₂,…,x_n-1, x_n) to (x₁, x₂ * x₃ * … * x_n-1 * x_n) before doing dense calculation. In this case, dim0 is x₁ and input_dim is x₂ * x₃ * … * x_n-1 * x_n. Output in this case is (x₁, output_dim). If batch-size is the first dimension (i.e. x₁), this would be equivalent to getting (batch_size, output_dim) output.

Now when flatten is False, you can think of FC to do a reshape from (x₁,x₂,…,x_n-1, x_n) to (x₁ * x₂ * … * x_n-1, x_n) before doing dense calculation. In this case, dim0 is x₁ * x₂ * … * x_n-1 and input_dim is x_n. Also in this case, there is a reshape after doing the dense calculation that recovers the initial shapes back to (x₁, x₂, … ,x_n-1, output_dim). If batch-size is the first dimension (i.e. x₁), this would be equivalent to getting (batch_size, x₂, … ,x_n-1, output_dim) output.

QiXuanWang · April 27, 2018, 3:30am

OK. I see. Actually input_dim is just xn. I was thinking about n-dim of the whole input. That’s weird to use two different notation for single function of same parameter?

So in short, what your said is like:
"
For flatten is True, input shape: (x1, x2, … xn-1, xn) -> output shape: (x1, num_hidden)
For flatten is False, input shape: (x1, x2, … xn-1, xn) -> output shape: (x1, x2, … xn-1, num_hidden)
And x1 is batch_size usually.
"
Am I right?

safrooze · April 27, 2018, 6:49am

You are 100% right

QiXuanWang · April 28, 2018, 6:39am

Thanks a lot for the patience. This solved my question.

Topic		Replies	Views
Confusing Note on dense and flatten? Gluon	3	1064	July 17, 2018
Batch reshaping produces different results from net	0	351	September 9, 2018
Backward of mxnet's network with BatchNorm doesn't have gradient in input layer but has gradient without BatchNorm MXNet Model Server	1	384	June 13, 2019
Rnn internal Hidden stade handling Gluon	1	531	June 4, 2018
Gradients through sparse.dot	3	511	July 2, 2018

Very simple question on input parameters of fullyconnected

Related Topics