http://d2l.ai/chapter_linear-networks/softmax-regression-scratch.html

I am not sure why nrows and ncols are needed as input parameters of Animator, but I would like to point out something about it. Animator works well when nrows is 1 or ncols is 1, but it does not work when both nrows and ncols are greater than 1, since self.axes[0].cla() will raise an error.

To prevent it, we may add the following line right after the line `if nrows * ncols == 1: self.axes = [self.axes,]`

:

`if nrows > 1 and ncols > 1: self.axes = self.axes.flatten()`

I canāt understand how this code illustrate the `pick`

function

Numpy version

`y_hat = np.array([[0.1, 0.3, 0.6], [0.3, 0.2, 0.5]]) y_hat[[0, 1], [0, 2]]`

NDArray version

`y_hat = nd.array([[0.1, 0.3, 0.6], [0.3, 0.2, 0.5]]) y = nd.array([0, 2], dtype='int32') nd.pick(y_hat, y)`

when I want to implement

y = nd.array([1, 2], dtype=āint32ā)

nd.pick(y_hat, y)

**How should I do in Numpy version**

In this example, we choose elements using slices; `[0, 1]`

refers to the elements along axis 1 (the first and the second rows) and `[0, 2]`

along axis 0 (the first and the third columns). Therefore, we obtain an array containing `y_hat[0, 0]`

and `y_hat[1, 2]`

.

Here, we pick the elements from the first and the third columns along whole axis 1 (in `nd.pick`

, axis is `-1`

by default, which is in our case, equivalent to `axis=1`

). This gives the same results as `y_hat[range(len(y_hat)), y]`

, so you donāt have to use `nd.pick`

.

Here is another example. We pick the probabilities from `y_hat`

that correspond to the classes from `y`

:

```
>>> y_hat = np.array([[0.1, 0.9], [0.55, 0.45], [0.3, 0.7], [0.85, 0.15]])
>>> y = np.array([1, 1, 1, 0], dtype=int)
>>> y_hat[range(len(y_hat)), y]
array([0.9 , 0.45, 0.7 , 0.85])
```

P.S. As far as I know, there is no `pick`

in numpy.

P.P.S. In the passage, there are multiple mentions of `nd.pick`

from NDArray version. I think it should be revised.

Hello there,

I got an error while running the code `d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, trainer)`

. Further examination showed to me that the error was caused by the `accuracy`

function trying to compare a int64 with a float32.

Below is my change to fix this issue:

(added a `.astype('float32'`

)`)

```
def accuracy(y_hat, y):
if y_hat.shape[1] > 1:
return float((y_hat.argmax(axis=1).astype('float32') == y.astype('float32')).sum())
else:
return float((y_hat.astype('int32') == y.astype('int32')).sum())
```

I was trying to figure out how to use Trainer objects in this case.

I mean, when you have a pre-determined model like nn.Sequential, you can simply use collect_params(), like for example:

```
model = nn.Sequential()
[....]
trainer = gluon.Trainer(model.collect_params(), 'sgd', {'learning_rate': some_value})
```

But what about this case? Iām following the tutorial with a few slight variations, so I have a mx.ndarray W containing the weights and biases of the model. How (if possible) do I create a Trainer, and pass W as parameter?

I created a dict of params by using:

```
params_dict = dict( (j+i*n_feats, W[i][j]) for i in range(len(W)) for j in range(len(W[0])))
```

and I tried to use:

```
trainer = gluon.Trainer(params_dict, 'sgd', {'learning_rate': 0.03})
trainer.step(batch_size)
```

but itās still not working, it returns:

```
ValueError: First argument must be a list or dict of Parameters, got list of <class 'mxnet.numpy.ndarray'>
```

I donāt get why exactly you plotted the train loss with the acc. to evaluate the model donāt we plot the train and test loss together ? why we didnāt calculate the test loss

Hey @AmalNammouchi, the loss function is referred to the error rate during the training. By minimizing the loss, we obtain the ābestā model. On the other hand, we donāt want our model to āmemorizeā data in the training set, so we evaluate accuracy or other measure on both training and test set. In this way, we know how the model performs on the dataset it has seen or hasnāt seen.

Great job! Thanks for sharing!

Hi @gold_piggy, thanks for your answer!

params_dict is a <class ādictā>, and its content is:

```
{0: array(0.02212206), 1: array(0.00774004), 2: array(0.0104344), 3: array(0.01183925), 4: array(0.01891711), 5: array(-0.01234741), [...], , 7838: array(0.01554978), 7839: array(0.00644765), 7840: array(0.01050874)}
```

For reference, the relevant part of the code Iām using is:

```
from mxnet import autograd, np, npx, gluon
npx.set_np()
def main():
n_feats = 784
K_classes = 10
W = np.random.normal(0, 0.01, (K_classes, n_feats+1))
# learning parameters
alpha = 0.001
num_epochs = 5
# create data
batch_size = 256
train_iter, test_iter = load_data_fashion_mnist(batch_size)
# training
W = training(batch_size, n_feats, num_epochs, alpha, W, train_iter)
## inference
print("Running inference")
inference(W, test_iter)
def load_data_fashion_mnist(batch_size, resize=None):
[...]
def model(X, W):
batch_size = X.shape[0]
n_feats = X.shape[2]*X.shape[3]
X = X.reshape(batch_size, n_feats)
X = np.concatenate( (np.ones((batch_size,1)), X), axis=1)
linear_output = np.dot(W, X.transpose())
return softmax(linear_output)
def softmax(X):
[...]
def cross_entropy(Y_hat, Y):
[...]
def accuracy(Y_hat, Y):
[...]
def training(batch_size, n_feats, num_epochs, alpha, W, train_iter):
params_dict = dict( (j+i*n_feats, W[i][j]) for i in range(len(W)) for j in range(len(W[0])))
for epoch in range(num_epochs):
print(f"Epoch n. {epoch}")
for X_batch, Y_batch in train_iter:
W.attach_grad()
with autograd.record():
Y_hat = model(X_batch, W)
loss = cross_entropy(Y_hat, Y_batch)
loss.backward()
## using SGD -> this works
W = SGD(W, alpha)
## using trainer -> TRYING TO MAKE THIS WORK
# trainer = gluon.Trainer(params_dict, 'sgd', {'learning_rate': 0.03})
# trainer.step(batch_size)
print(f"\tLoss: {loss}")
acc = accuracy(Y_hat, Y_batch)
print(f"\tAcc: {acc}")
return W
def SGD(W, alpha):
[...]
def get_fashion_MNIST_labels(labels):
[...]
def inference(W, test_iter):
[...]
```

Hi @LewsTherin511, `collect_params()`

produces a list of named parameters, so the trainer knows which one to gradient descent. In you dict, the parameters are not what the `Trainer`

function is looking forā¦

Please check here for more details of `collect_params()`

: https://d2l.ai/chapter_deep-learning-computation/parameters.html?#collecting-parameters-from-nested-blocks