 # Softmax Regression from Scratch

I am not sure why nrows and ncols are needed as input parameters of Animator, but I would like to point out something about it. Animator works well when nrows is 1 or ncols is 1, but it does not work when both nrows and ncols are greater than 1, since self.axes.cla() will raise an error.
To prevent it, we may add the following line right after the line `if nrows * ncols == 1: self.axes = [self.axes,]`:
`if nrows > 1 and ncols > 1: self.axes = self.axes.flatten()`

1 Like

I can’t understand how this code illustrate the `pick` function
Numpy version

``````y_hat = np.array([[0.1, 0.3, 0.6], [0.3, 0.2, 0.5]])
y_hat[[0, 1], [0, 2]]
``````

NDArray version

``````y_hat = nd.array([[0.1, 0.3, 0.6], [0.3, 0.2, 0.5]])
y = nd.array([0, 2], dtype='int32')
nd.pick(y_hat, y)
``````

when I want to implement

y = nd.array([1, 2], dtype=‘int32’)
nd.pick(y_hat, y)

How should I do in Numpy version

1 Like

In this example, we choose elements using slices; `[0, 1]` refers to the elements along axis 1 (the first and the second rows) and `[0, 2]` along axis 0 (the first and the third columns). Therefore, we obtain an array containing `y_hat[0, 0]` and `y_hat[1, 2]`.

Here, we pick the elements from the first and the third columns along whole axis 1 (in `nd.pick`, axis is `-1` by default, which is in our case, equivalent to `axis=1`). This gives the same results as `y_hat[range(len(y_hat)), y]`, so you don’t have to use `nd.pick`.

Here is another example. We pick the probabilities from `y_hat` that correspond to the classes from `y`:

``````>>> y_hat = np.array([[0.1, 0.9], [0.55, 0.45], [0.3, 0.7], [0.85, 0.15]])
>>> y = np.array([1, 1, 1, 0], dtype=int)
>>> y_hat[range(len(y_hat)), y]
array([0.9 , 0.45, 0.7 , 0.85])
``````

P.S. As far as I know, there is no `pick` in numpy.
P.P.S. In the passage, there are multiple mentions of `nd.pick` from NDArray version. I think it should be revised.

2 Likes

Hey @sanjaradylov, thanks for pointing out! We will fix it!

Hello there,

I got an error while running the code `d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, trainer)`. Further examination showed to me that the error was caused by the `accuracy` function trying to compare a int64 with a float32.

Below is my change to fix this issue:
(added a `.astype('float32'`)`)

``````def accuracy(y_hat, y):
if y_hat.shape > 1:
return float((y_hat.argmax(axis=1).astype('float32') == y.astype('float32')).sum())
else:
return float((y_hat.astype('int32') == y.astype('int32')).sum())
``````
1 Like

I was trying to figure out how to use Trainer objects in this case.

I mean, when you have a pre-determined model like nn.Sequential, you can simply use collect_params(), like for example:

``````model = nn.Sequential()
[....]
trainer = gluon.Trainer(model.collect_params(), 'sgd', {'learning_rate': some_value})
``````

But what about this case? I’m following the tutorial with a few slight variations, so I have a mx.ndarray W containing the weights and biases of the model. How (if possible) do I create a Trainer, and pass W as parameter?

I created a dict of params by using:

``````params_dict = dict( (j+i*n_feats, W[i][j]) for i in range(len(W)) for j in range(len(W)))
``````

and I tried to use:

``````trainer = gluon.Trainer(params_dict, 'sgd', {'learning_rate': 0.03})
trainer.step(batch_size)
``````

but it’s still not working, it returns:

``````ValueError: First argument must be a list or dict of Parameters, got list of <class 'mxnet.numpy.ndarray'>
``````
1 Like

I don’t get why exactly you plotted the train loss with the acc. to evaluate the model don’t we plot the train and test loss together ? why we didn’t calculate the test loss

Hey @AmalNammouchi, the loss function is referred to the error rate during the training. By minimizing the loss, we obtain the “best” model. On the other hand, we don’t want our model to “memorize” data in the training set, so we evaluate accuracy or other measure on both training and test set. In this way, we know how the model performs on the dataset it has seen or hasn’t seen.

Hey @LewsTherin511, can you print the “params_dict” or check the type of it?

Great job! Thanks for sharing!

params_dict is a <class ‘dict’>, and its content is:

``````{0: array(0.02212206), 1: array(0.00774004), 2: array(0.0104344), 3: array(0.01183925), 4: array(0.01891711), 5: array(-0.01234741), [...], , 7838: array(0.01554978), 7839: array(0.00644765), 7840: array(0.01050874)}
``````

For reference, the relevant part of the code I’m using is:

``````from mxnet import autograd, np, npx, gluon
npx.set_np()

def main():
n_feats = 784
K_classes = 10
W = np.random.normal(0, 0.01, (K_classes, n_feats+1))

# learning parameters
alpha = 0.001
num_epochs = 5

# create data
batch_size = 256

# training
W = training(batch_size, n_feats, num_epochs, alpha, W, train_iter)

## inference
print("Running inference")
inference(W, test_iter)

[...]

def model(X, W):
batch_size = X.shape
n_feats = X.shape*X.shape
X = X.reshape(batch_size, n_feats)
X = np.concatenate( (np.ones((batch_size,1)), X), axis=1)
linear_output = np.dot(W, X.transpose())
return softmax(linear_output)

def softmax(X):
[...]

def cross_entropy(Y_hat, Y):
[...]

def accuracy(Y_hat, Y):
[...]

def training(batch_size, n_feats, num_epochs, alpha, W, train_iter):
params_dict = dict( (j+i*n_feats, W[i][j]) for i in range(len(W)) for j in range(len(W)))

for epoch in range(num_epochs):
print(f"Epoch n. {epoch}")
for X_batch, Y_batch in train_iter:
Y_hat  = model(X_batch, W)
loss = cross_entropy(Y_hat, Y_batch)
loss.backward()

## using SGD -> this works
W = SGD(W, alpha)
## using trainer -> TRYING TO MAKE THIS WORK
# trainer = gluon.Trainer(params_dict, 'sgd', {'learning_rate': 0.03})
# trainer.step(batch_size)

print(f"\tLoss: {loss}")
acc = accuracy(Y_hat, Y_batch)
print(f"\tAcc: {acc}")
return W

def SGD(W, alpha):
[...]

def get_fashion_MNIST_labels(labels):
[...]

def inference(W, test_iter):
[...]``````

Hi @LewsTherin511, `collect_params()` produces a list of named parameters, so the trainer knows which one to gradient descent. In you dict, the parameters are not what the `Trainer` function is looking for…

Please check here for more details of `collect_params()`: https://d2l.ai/chapter_deep-learning-computation/parameters.html?#collecting-parameters-from-nested-blocks

1 Like