Softmax Regression from Scratch

mli · November 27, 2018, 10:29pm

http://d2l.ai/chapter_linear-networks/softmax-regression-scratch.html

Daeshik_Choi · August 11, 2019, 4:03pm

I am not sure why nrows and ncols are needed as input parameters of Animator, but I would like to point out something about it. Animator works well when nrows is 1 or ncols is 1, but it does not work when both nrows and ncols are greater than 1, since self.axes[0].cla() will raise an error.
To prevent it, we may add the following line right after the line if nrows * ncols == 1: self.axes = [self.axes,]:
if nrows > 1 and ncols > 1: self.axes = self.axes.flatten()

iotboy · February 20, 2020, 2:57am

I can’t understand how this code illustrate the pick function
Numpy version

y_hat = np.array([[0.1, 0.3, 0.6], [0.3, 0.2, 0.5]])
y_hat[[0, 1], [0, 2]]

NDArray version

y_hat = nd.array([[0.1, 0.3, 0.6], [0.3, 0.2, 0.5]])
y = nd.array([0, 2], dtype='int32')
nd.pick(y_hat, y)

when I want to implement

y = nd.array([1, 2], dtype=‘int32’)
nd.pick(y_hat, y)

How should I do in Numpy version

sanjaradylov · February 20, 2020, 11:23am

In this example, we choose elements using slices; [0, 1] refers to the elements along axis 1 (the first and the second rows) and [0, 2] along axis 0 (the first and the third columns). Therefore, we obtain an array containing y_hat[0, 0] and y_hat[1, 2].

Here, we pick the elements from the first and the third columns along whole axis 1 (in nd.pick, axis is -1 by default, which is in our case, equivalent to axis=1). This gives the same results as y_hat[range(len(y_hat)), y], so you don’t have to use nd.pick.

Here is another example. We pick the probabilities from y_hat that correspond to the classes from y:

>>> y_hat = np.array([[0.1, 0.9], [0.55, 0.45], [0.3, 0.7], [0.85, 0.15]])
>>> y = np.array([1, 1, 1, 0], dtype=int)
>>> y_hat[range(len(y_hat)), y]
array([0.9 , 0.45, 0.7 , 0.85])

P.S. As far as I know, there is no pick in numpy.
P.P.S. In the passage, there are multiple mentions of nd.pick from NDArray version. I think it should be revised.

gold_piggy · February 21, 2020, 9:38pm

Hey @sanjaradylov, thanks for pointing out! We will fix it!

hsneto · March 2, 2020, 1:23am

Hello there,

I got an error while running the code d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, trainer). Further examination showed to me that the error was caused by the accuracy function trying to compare a int64 with a float32.

Below is my change to fix this issue:
(added a .astype('float32')`)

def accuracy(y_hat, y):
    if y_hat.shape[1] > 1:
        return float((y_hat.argmax(axis=1).astype('float32') == y.astype('float32')).sum())
    else:
        return float((y_hat.astype('int32') == y.astype('int32')).sum())

LewsTherin511 · March 4, 2020, 3:32pm

I was trying to figure out how to use Trainer objects in this case.

I mean, when you have a pre-determined model like nn.Sequential, you can simply use collect_params(), like for example:

model = nn.Sequential()
[....]
trainer = gluon.Trainer(model.collect_params(), 'sgd', {'learning_rate': some_value})

But what about this case? I’m following the tutorial with a few slight variations, so I have a mx.ndarray W containing the weights and biases of the model. How (if possible) do I create a Trainer, and pass W as parameter?

I created a dict of params by using:

params_dict = dict( (j+i*n_feats, W[i][j]) for i in range(len(W)) for j in range(len(W[0])))

and I tried to use:

trainer = gluon.Trainer(params_dict, 'sgd', {'learning_rate': 0.03})
	trainer.step(batch_size)

but it’s still not working, it returns:

ValueError: First argument must be a list or dict of Parameters, got list of <class 'mxnet.numpy.ndarray'>

AmalNammouchi · March 21, 2020, 6:02am

I don’t get why exactly you plotted the train loss with the acc. to evaluate the model don’t we plot the train and test loss together ? why we didn’t calculate the test loss

gold_piggy · March 31, 2020, 7:18pm

Hey @AmalNammouchi, the loss function is referred to the error rate during the training. By minimizing the loss, we obtain the “best” model. On the other hand, we don’t want our model to “memorize” data in the training set, so we evaluate accuracy or other measure on both training and test set. In this way, we know how the model performs on the dataset it has seen or hasn’t seen.

gold_piggy · March 31, 2020, 7:23pm

Hey @LewsTherin511, can you print the “params_dict” or check the type of it?

gold_piggy · March 31, 2020, 7:24pm

Great job! Thanks for sharing!

LewsTherin511 · April 7, 2020, 9:58am

Hi @gold_piggy, thanks for your answer!

params_dict is a <class ‘dict’>, and its content is:

{0: array(0.02212206), 1: array(0.00774004), 2: array(0.0104344), 3: array(0.01183925), 4: array(0.01891711), 5: array(-0.01234741), [...], , 7838: array(0.01554978), 7839: array(0.00644765), 7840: array(0.01050874)}

For reference, the relevant part of the code I’m using is:

from mxnet import autograd, np, npx, gluon
npx.set_np()

def main():
    n_feats = 784
    K_classes = 10
    W = np.random.normal(0, 0.01, (K_classes, n_feats+1))

    # learning parameters
    alpha = 0.001
    num_epochs = 5

    # create data
    batch_size = 256
    train_iter, test_iter = load_data_fashion_mnist(batch_size)

    # training
    W = training(batch_size, n_feats, num_epochs, alpha, W, train_iter)

    ## inference
    print("Running inference")
    inference(W, test_iter)


def load_data_fashion_mnist(batch_size, resize=None):
    [...]


def model(X, W):
    batch_size = X.shape[0]
    n_feats = X.shape[2]*X.shape[3]
    X = X.reshape(batch_size, n_feats)
    X = np.concatenate( (np.ones((batch_size,1)), X), axis=1)
    linear_output = np.dot(W, X.transpose())
    return softmax(linear_output)


def softmax(X):
    [...]

def cross_entropy(Y_hat, Y):
    [...]

def accuracy(Y_hat, Y):
    [...]

def training(batch_size, n_feats, num_epochs, alpha, W, train_iter):
    params_dict = dict( (j+i*n_feats, W[i][j]) for i in range(len(W)) for j in range(len(W[0])))

    for epoch in range(num_epochs):
            print(f"Epoch n. {epoch}")
            for X_batch, Y_batch in train_iter:
                W.attach_grad()
                    with autograd.record():
                        Y_hat  = model(X_batch, W)
                        loss = cross_entropy(Y_hat, Y_batch)
                    loss.backward()

                    ## using SGD -> this works
                    W = SGD(W, alpha)
                    ## using trainer -> TRYING TO MAKE THIS WORK
                    # trainer = gluon.Trainer(params_dict, 'sgd', {'learning_rate': 0.03})
                    # trainer.step(batch_size)

            print(f"\tLoss: {loss}")
            acc = accuracy(Y_hat, Y_batch)
            print(f"\tAcc: {acc}")
    return W


def SGD(W, alpha):
    [...]

def get_fashion_MNIST_labels(labels):
    [...]

def inference(W, test_iter):
    [...]

gold_piggy · April 20, 2020, 4:36pm

Hi @LewsTherin511, collect_params() produces a list of named parameters, so the trainer knows which one to gradient descent. In you dict, the parameters are not what the Trainer function is looking for…

Please check here for more details of collect_params(): https://d2l.ai/chapter_deep-learning-computation/parameters.html?#collecting-parameters-from-nested-blocks

Topic		Replies	Views
Derivative of Softmax Discussion	1	731	December 24, 2018
Multilayer Perceptron D2L Book	4	1118	March 24, 2020
Hw5 Q1 Binary class 1 and 0	0	257	February 26, 2019
Implementation of weighted softmax by extending mx.autograd.Function fails	2	651	September 2, 2019
Predicting House Prices on Kaggle D2L Book	8	1459	May 10, 2020

Softmax Regression from Scratch

Related Topics