Linear regression with dot product

abieler · October 16, 2017, 7:27pm

For educational purposes I want to have a linear regression example that is using mx.sym.dot(X, w) instead of
mx.sym.FullyConnected(X, num_hidden=1), see code example below. Is there a way to do this?
I know I can do a similar thing with nd and autograd instead of sym, but then I also have to implement SGD by hand, which is not what I am looking for …

m = 1000
batch_size = 100
nVars = 4
data = np.random.normal(0,1, (m, nVars))
labels = -10 * data[:,0] + data[:,1]*np.pi + 5 * np.sqrt(abs(data[:,2])) - data[:,3] + np.random.normal(0,1, m)*2

train_iter = mx.io.NDArrayIter(data={'data':data}, label={'labels':labels}, batch_size=batch_size)

X = mx.sym.Variable('data', shape=(batch_size, nVars))
y = mx.sym.Variable('labels', shape=(batch_size))
w = mx.sym.var(name='theta', shape=(nVars), init=mx.initializer.Normal())

# this works as expected
fc = mx.sym.FullyConnected(data=X, name='fc1', num_hidden=1)
yhat = mx.sym.LinearRegressionOutput(fc, label=y, name='yhat')
model = mx.mod.Module(symbol=yhat, data_names=['data'], label_names=['labels'])
train_iter.reset()
model.fit(train_iter, num_epoch=10)
pred = model.predict(train_iter).asnumpy().flatten()


# with this solution I cannot figure out how to make the optimizer improve w.
fc_dot = mx.sym.dot(X, w)
yhat_dot = mx.sym.LinearRegressionOutput(fc_dot, label=y, name='yhat_dot')
model_dot = mx.mod.Module(symbol=yhat_dot, data_names=['data'], label_names=['labels'])
train_iter.reset()
model_dot.fit(train_iter, num_epoch=10)
pred_dot = model_dot.predict(train_iter).asnumpy().flatten()

np.mean(pred_dot - labels)

madjam · October 16, 2017, 7:37pm

For these two to be the same you should disable bias in FullyConnected layer.
fc = mx.sym.FullyConnected(data=X, name='fc1', num_hidden=1, no_bias=True)

madjam · October 16, 2017, 7:46pm

Or alternatively, add a bias term initialized to zero to your deconstructed example.

abieler · October 17, 2017, 5:52am

Thanks for the quick reply! I am aware of the fact that FullyConnected() without bias is the same as the dot product.

I am currently writing up a small document where I implement Linear Regression from scratch and with various ML tools,. From scratch it is a simple dot product, then in tensorflow I could do the same thing with the dot product. So for simple reasons of consitancy withing this document I was trying to implement it with a dot product in mxnet too.

(Instead of explaining to the reader: "Hey, did you happen to know that Linear Regression is basically a 1 layer Fully Connected Neural Net without activation function, hence we use FullyConnected()". Which would come at a later point…)

saswatac · October 23, 2017, 4:01am

It works if you define the weight as

w = mx.sym.var(name='theta', shape=(nVars, 1), init=mx.initializer.Normal())

abieler · October 23, 2017, 6:06am

@madjam I revisited my problem and only now came understand what you meant with your comments…

Indeed I did not realize the default was to add a bias term in FullyConnected() and that it hence was NOT the same as my deconstructed example… Further, because the fit without bias was so bad my brain wrongly assumed there was no optimization going on at all.
For sake of completeness here the code that finally does what I was looking for.

Thanks for the help everybody!

import numpy as np
import mxnet as mx


nEpochs = 100
m = 1000
batch_size = 100
nVars = 4
data = np.random.normal(0,1, (m, nVars))
labels = -10 * data[:,0] + data[:,1]*np.pi + 5 * np.sqrt(abs(data[:,2])) - data[:,3] + np.random.normal(0,1, m)*2

train_iter = mx.io.NDArrayIter(data={'data':data}, label={'labels':labels}, batch_size=batch_size)

X = mx.sym.Variable('data', shape=(batch_size, nVars))
y = mx.sym.Variable('labels', shape=(batch_size))
w = mx.sym.var(name='theta', shape=(nVars, 1), init=mx.initializer.Normal())
b = mx.sym.var(name='bias', shape=(1), init=mx.initializer.Zero())

fc_dot = mx.sym.broadcast_add(mx.sym.dot(X, w), b)
yhat_dot = mx.sym.LinearRegressionOutput(fc_dot, label=y, name='yhat_dot')
model_dot = mx.mod.Module(symbol=yhat_dot, data_names=['data'], label_names=['labels'])
model_dot.fit(train_iter, num_epoch=nEpochs)
pred_dot = model_dot.predict(train_iter).asnumpy().flatten()

print(np.mean(pred_dot - labels))

madjam · October 23, 2017, 5:35pm

I could have been more clear in my response.
Glad that you found a solution.

Topic		Replies	Views
How to use a symbol with its "name" is known Discussion	2	426	July 27, 2019
Resolving Check failed: axis < ndim && axis >= -ndim axis 1 exceeds the input dimension of 1 Discussion	6	1347	March 1, 2019
Mx.nd.broadcast_equal and mx.sym.broadcast_equal? Discussion	1	462	July 3, 2018
Nan generated when I use backward for symbol	2	770	September 25, 2018
Get rows of a csr sparse matrix Performance	2	968	June 16, 2018

Linear regression with dot product

Related Topics