Linear regression with dot product

For educational purposes I want to have a linear regression example that is using mx.sym.dot(X, w) instead of
mx.sym.FullyConnected(X, num_hidden=1), see code example below. Is there a way to do this?
I know I can do a similar thing with nd and autograd instead of sym, but then I also have to implement SGD by hand, which is not what I am looking for :slight_smile:…

m = 1000
batch_size = 100
nVars = 4
data = np.random.normal(0,1, (m, nVars))
labels = -10 * data[:,0] + data[:,1]*np.pi + 5 * np.sqrt(abs(data[:,2])) - data[:,3] + np.random.normal(0,1, m)*2

train_iter = mx.io.NDArrayIter(data={'data':data}, label={'labels':labels}, batch_size=batch_size)

X = mx.sym.Variable('data', shape=(batch_size, nVars))
y = mx.sym.Variable('labels', shape=(batch_size))
w = mx.sym.var(name='theta', shape=(nVars), init=mx.initializer.Normal())

# this works as expected
fc = mx.sym.FullyConnected(data=X, name='fc1', num_hidden=1)
yhat = mx.sym.LinearRegressionOutput(fc, label=y, name='yhat')
model = mx.mod.Module(symbol=yhat, data_names=['data'], label_names=['labels'])
train_iter.reset()
model.fit(train_iter, num_epoch=10)
pred = model.predict(train_iter).asnumpy().flatten()


# with this solution I cannot figure out how to make the optimizer improve w.
fc_dot = mx.sym.dot(X, w)
yhat_dot = mx.sym.LinearRegressionOutput(fc_dot, label=y, name='yhat_dot')
model_dot = mx.mod.Module(symbol=yhat_dot, data_names=['data'], label_names=['labels'])
train_iter.reset()
model_dot.fit(train_iter, num_epoch=10)
pred_dot = model_dot.predict(train_iter).asnumpy().flatten()

np.mean(pred_dot - labels)

For these two to be the same you should disable bias in FullyConnected layer.
fc = mx.sym.FullyConnected(data=X, name='fc1', num_hidden=1, no_bias=True)

Or alternatively, add a bias term initialized to zero to your deconstructed example.

Thanks for the quick reply! I am aware of the fact that FullyConnected() without bias is the same as the dot product.

I am currently writing up a small document where I implement Linear Regression from scratch and with various ML tools,. From scratch it is a simple dot product, then in tensorflow I could do the same thing with the dot product. So for simple reasons of consitancy withing this document I was trying to implement it with a dot product in mxnet too.

(Instead of explaining to the reader: "Hey, did you happen to know that Linear Regression is basically a 1 layer Fully Connected Neural Net without activation function, hence we use FullyConnected()". Which would come at a later point…)

It works if you define the weight as

w = mx.sym.var(name='theta', shape=(nVars, 1), init=mx.initializer.Normal())

@madjam I revisited my problem and only now came understand what you meant with your comments… :sweat_smile:

Indeed I did not realize the default was to add a bias term in FullyConnected() and that it hence was NOT the same as my deconstructed example… Further, because the fit without bias was so bad my brain wrongly assumed there was no optimization going on at all.
For sake of completeness here the code that finally does what I was looking for.

Thanks for the help everybody!

import numpy as np
import mxnet as mx


nEpochs = 100
m = 1000
batch_size = 100
nVars = 4
data = np.random.normal(0,1, (m, nVars))
labels = -10 * data[:,0] + data[:,1]*np.pi + 5 * np.sqrt(abs(data[:,2])) - data[:,3] + np.random.normal(0,1, m)*2

train_iter = mx.io.NDArrayIter(data={'data':data}, label={'labels':labels}, batch_size=batch_size)

X = mx.sym.Variable('data', shape=(batch_size, nVars))
y = mx.sym.Variable('labels', shape=(batch_size))
w = mx.sym.var(name='theta', shape=(nVars, 1), init=mx.initializer.Normal())
b = mx.sym.var(name='bias', shape=(1), init=mx.initializer.Zero())

fc_dot = mx.sym.broadcast_add(mx.sym.dot(X, w), b)
yhat_dot = mx.sym.LinearRegressionOutput(fc_dot, label=y, name='yhat_dot')
model_dot = mx.mod.Module(symbol=yhat_dot, data_names=['data'], label_names=['labels'])
model_dot.fit(train_iter, num_epoch=nEpochs)
pred_dot = model_dot.predict(train_iter).asnumpy().flatten()

print(np.mean(pred_dot - labels))

I could have been more clear in my response.
Glad that you found a solution.