Very simple question on input parameters of fullyconnected

mxnet.symbol.FullyConnected(data=None, weight=None, bias=None, num_hidden=_Null, no_bias=_Null, flatten=_Null, name=None, attr=None, out=None, **kwargs)¶
If flatten is set to be false, then the shapes are:
data: (x1, x2, …, xn, input_dim)
weight: (num_hidden, input_dim)
bias: (num_hidden,)
out: (x1, x2, …, xn, num_hidden)

So what’s the xn and input_dim meaning here? suppose a input image of (28,28,3) with batch 8.

Basically if flatten is False, FC layer is applied to the last dimension of input array:
FC: input_dim --> num_hidden .

For example if data is (x1, x2, …, xn-1, xn), the output will be (x1, x2, …, xn-1, num_hidden) and the weight dimensions will be (num_hidden, xn).

Thanks. But I still don’t get it.
If (x1, … xn) is data shape, where is batch_size? I see if flatten is True, there is a batch_size dimension. But I don’t understand why it’s missing when flatten is False.

And what is input_dim here? if (x1, … xn) is (28, 28, 1), input_dim would be 3? and output shape would be (x1, … x_n-1, num_hidden)? Isn’t it different from official document since there writes (x1, … xn, num_hidden)? Or it’s typo?

I don’t really understand when flatten is False. For flatten is True, it’s quite simple and you can find documents on other framework to be similar. (Like in TF).

Batch-size could be any of the dimensions, but normally batch-size is the first dimension of your NDArray. For example in my example, batch-size would be x1. Think of your FC as a mapping from (dim0, input_dim) to (dim0, output_dim). Now when flatten is True, you can think of FC to do a reshape from (x1,x2,…,xn-1, xn) to (x1, x2 * x3 * … * xn-1 * xn) before doing dense calculation. In this case, dim0 is x1 and input_dim is x2 * x3 * … * xn-1 * xn. Output in this case is (x1, output_dim). If batch-size is the first dimension (i.e. x1), this would be equivalent to getting (batch_size, output_dim) output.

Now when flatten is False, you can think of FC to do a reshape from (x1,x2,…,xn-1, xn) to (x1 * x2 * … * xn-1, xn) before doing dense calculation. In this case, dim0 is x1 * x2 * … * xn-1 and input_dim is xn. Also in this case, there is a reshape after doing the dense calculation that recovers the initial shapes back to (x1, x2, … ,xn-1, output_dim). If batch-size is the first dimension (i.e. x1), this would be equivalent to getting (batch_size, x2, … ,xn-1, output_dim) output.

2 Likes

OK. I see. Actually input_dim is just xn. I was thinking about n-dim of the whole input. That’s weird to use two different notation for single function of same parameter?

So in short, what your said is like:
"
For flatten is True, input shape: (x1, x2, … xn-1, xn) -> output shape: (x1, num_hidden)
For flatten is False, input shape: (x1, x2, … xn-1, xn) -> output shape: (x1, x2, … xn-1, num_hidden)
And x1 is batch_size usually.
"
Am I right?

You are 100% right :slight_smile:

Thanks a lot for the patience. This solved my question.