Softmax Regression from Scratch

I am not sure why nrows and ncols are needed as input parameters of Animator, but I would like to point out something about it. Animator works well when nrows is 1 or ncols is 1, but it does not work when both nrows and ncols are greater than 1, since self.axes[0].cla() will raise an error.
To prevent it, we may add the following line right after the line if nrows * ncols == 1: self.axes = [self.axes,]:
if nrows > 1 and ncols > 1: self.axes = self.axes.flatten()

1 Like

I can’t understand how this code illustrate the pick function
Numpy version

y_hat = np.array([[0.1, 0.3, 0.6], [0.3, 0.2, 0.5]])
y_hat[[0, 1], [0, 2]]

NDArray version

y_hat = nd.array([[0.1, 0.3, 0.6], [0.3, 0.2, 0.5]])
y = nd.array([0, 2], dtype='int32')
nd.pick(y_hat, y)

when I want to implement

y = nd.array([1, 2], dtype=‘int32’)
nd.pick(y_hat, y)

How should I do in Numpy version

1 Like

In this example, we choose elements using slices; [0, 1] refers to the elements along axis 1 (the first and the second rows) and [0, 2] along axis 0 (the first and the third columns). Therefore, we obtain an array containing y_hat[0, 0] and y_hat[1, 2].

Here, we pick the elements from the first and the third columns along whole axis 1 (in nd.pick, axis is -1 by default, which is in our case, equivalent to axis=1). This gives the same results as y_hat[range(len(y_hat)), y], so you don’t have to use nd.pick.

Here is another example. We pick the probabilities from y_hat that correspond to the classes from y:

>>> y_hat = np.array([[0.1, 0.9], [0.55, 0.45], [0.3, 0.7], [0.85, 0.15]])
>>> y = np.array([1, 1, 1, 0], dtype=int)
>>> y_hat[range(len(y_hat)), y]
array([0.9 , 0.45, 0.7 , 0.85])

P.S. As far as I know, there is no pick in numpy.
P.P.S. In the passage, there are multiple mentions of nd.pick from NDArray version. I think it should be revised.


Hey @sanjaradylov, thanks for pointing out! We will fix it!

Hello there,

I got an error while running the code d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, trainer). Further examination showed to me that the error was caused by the accuracy function trying to compare a int64 with a float32.

Below is my change to fix this issue:
(added a .astype('float32')`)

def accuracy(y_hat, y):
    if y_hat.shape[1] > 1:
        return float((y_hat.argmax(axis=1).astype('float32') == y.astype('float32')).sum())
        return float((y_hat.astype('int32') == y.astype('int32')).sum())
1 Like

I was trying to figure out how to use Trainer objects in this case.

I mean, when you have a pre-determined model like nn.Sequential, you can simply use collect_params(), like for example:

model = nn.Sequential()
trainer = gluon.Trainer(model.collect_params(), 'sgd', {'learning_rate': some_value})

But what about this case? I’m following the tutorial with a few slight variations, so I have a mx.ndarray W containing the weights and biases of the model. How (if possible) do I create a Trainer, and pass W as parameter?

I created a dict of params by using:

params_dict = dict( (j+i*n_feats, W[i][j]) for i in range(len(W)) for j in range(len(W[0])))

and I tried to use:

trainer = gluon.Trainer(params_dict, 'sgd', {'learning_rate': 0.03})

but it’s still not working, it returns:

ValueError: First argument must be a list or dict of Parameters, got list of <class 'mxnet.numpy.ndarray'>
1 Like

I don’t get why exactly you plotted the train loss with the acc. to evaluate the model don’t we plot the train and test loss together ? why we didn’t calculate the test loss

Hey @AmalNammouchi, the loss function is referred to the error rate during the training. By minimizing the loss, we obtain the “best” model. On the other hand, we don’t want our model to “memorize” data in the training set, so we evaluate accuracy or other measure on both training and test set. In this way, we know how the model performs on the dataset it has seen or hasn’t seen.

Hey @LewsTherin511, can you print the “params_dict” or check the type of it?

Great job! Thanks for sharing!

Hi @gold_piggy, thanks for your answer!

params_dict is a <class ‘dict’>, and its content is:

{0: array(0.02212206), 1: array(0.00774004), 2: array(0.0104344), 3: array(0.01183925), 4: array(0.01891711), 5: array(-0.01234741), [...], , 7838: array(0.01554978), 7839: array(0.00644765), 7840: array(0.01050874)}

For reference, the relevant part of the code I’m using is:

from mxnet import autograd, np, npx, gluon

def main():
    n_feats = 784
    K_classes = 10
    W = np.random.normal(0, 0.01, (K_classes, n_feats+1))

    # learning parameters
    alpha = 0.001
    num_epochs = 5

    # create data
    batch_size = 256
    train_iter, test_iter = load_data_fashion_mnist(batch_size)

    # training
    W = training(batch_size, n_feats, num_epochs, alpha, W, train_iter)

    ## inference
    print("Running inference")
    inference(W, test_iter)

def load_data_fashion_mnist(batch_size, resize=None):

def model(X, W):
    batch_size = X.shape[0]
    n_feats = X.shape[2]*X.shape[3]
    X = X.reshape(batch_size, n_feats)
    X = np.concatenate( (np.ones((batch_size,1)), X), axis=1)
    linear_output =, X.transpose())
    return softmax(linear_output)

def softmax(X):

def cross_entropy(Y_hat, Y):

def accuracy(Y_hat, Y):

def training(batch_size, n_feats, num_epochs, alpha, W, train_iter):
    params_dict = dict( (j+i*n_feats, W[i][j]) for i in range(len(W)) for j in range(len(W[0])))

    for epoch in range(num_epochs):
            print(f"Epoch n. {epoch}")
            for X_batch, Y_batch in train_iter:
                    with autograd.record():
                        Y_hat  = model(X_batch, W)
                        loss = cross_entropy(Y_hat, Y_batch)

                    ## using SGD -> this works
                    W = SGD(W, alpha)
                    ## using trainer -> TRYING TO MAKE THIS WORK
                    # trainer = gluon.Trainer(params_dict, 'sgd', {'learning_rate': 0.03})
                    # trainer.step(batch_size)

            print(f"\tLoss: {loss}")
            acc = accuracy(Y_hat, Y_batch)
            print(f"\tAcc: {acc}")
    return W

def SGD(W, alpha):

def get_fashion_MNIST_labels(labels):

def inference(W, test_iter):

Hi @LewsTherin511, collect_params() produces a list of named parameters, so the trainer knows which one to gradient descent. In you dict, the parameters are not what the Trainer function is looking for…

Please check here for more details of collect_params():

1 Like