http://d2l.ai/chapter_linear-networks/softmax-regression-scratch.html
I am not sure why nrows and ncols are needed as input parameters of Animator, but I would like to point out something about it. Animator works well when nrows is 1 or ncols is 1, but it does not work when both nrows and ncols are greater than 1, since self.axes[0].cla() will raise an error.
To prevent it, we may add the following line right after the line if nrows * ncols == 1: self.axes = [self.axes,]
:
if nrows > 1 and ncols > 1: self.axes = self.axes.flatten()
I can’t understand how this code illustrate the pick
function
Numpy version
y_hat = np.array([[0.1, 0.3, 0.6], [0.3, 0.2, 0.5]]) y_hat[[0, 1], [0, 2]]
NDArray version
y_hat = nd.array([[0.1, 0.3, 0.6], [0.3, 0.2, 0.5]]) y = nd.array([0, 2], dtype='int32') nd.pick(y_hat, y)
when I want to implement
y = nd.array([1, 2], dtype=‘int32’)
nd.pick(y_hat, y)
How should I do in Numpy version
In this example, we choose elements using slices; [0, 1]
refers to the elements along axis 1 (the first and the second rows) and [0, 2]
along axis 0 (the first and the third columns). Therefore, we obtain an array containing y_hat[0, 0]
and y_hat[1, 2]
.
Here, we pick the elements from the first and the third columns along whole axis 1 (in nd.pick
, axis is -1
by default, which is in our case, equivalent to axis=1
). This gives the same results as y_hat[range(len(y_hat)), y]
, so you don’t have to use nd.pick
.
Here is another example. We pick the probabilities from y_hat
that correspond to the classes from y
:
>>> y_hat = np.array([[0.1, 0.9], [0.55, 0.45], [0.3, 0.7], [0.85, 0.15]])
>>> y = np.array([1, 1, 1, 0], dtype=int)
>>> y_hat[range(len(y_hat)), y]
array([0.9 , 0.45, 0.7 , 0.85])
P.S. As far as I know, there is no pick
in numpy.
P.P.S. In the passage, there are multiple mentions of nd.pick
from NDArray version. I think it should be revised.
Hello there,
I got an error while running the code d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, trainer)
. Further examination showed to me that the error was caused by the accuracy
function trying to compare a int64 with a float32.
Below is my change to fix this issue:
(added a .astype('float32'
)`)
def accuracy(y_hat, y):
if y_hat.shape[1] > 1:
return float((y_hat.argmax(axis=1).astype('float32') == y.astype('float32')).sum())
else:
return float((y_hat.astype('int32') == y.astype('int32')).sum())
I was trying to figure out how to use Trainer objects in this case.
I mean, when you have a pre-determined model like nn.Sequential, you can simply use collect_params(), like for example:
model = nn.Sequential()
[....]
trainer = gluon.Trainer(model.collect_params(), 'sgd', {'learning_rate': some_value})
But what about this case? I’m following the tutorial with a few slight variations, so I have a mx.ndarray W containing the weights and biases of the model. How (if possible) do I create a Trainer, and pass W as parameter?
I created a dict of params by using:
params_dict = dict( (j+i*n_feats, W[i][j]) for i in range(len(W)) for j in range(len(W[0])))
and I tried to use:
trainer = gluon.Trainer(params_dict, 'sgd', {'learning_rate': 0.03})
trainer.step(batch_size)
but it’s still not working, it returns:
ValueError: First argument must be a list or dict of Parameters, got list of <class 'mxnet.numpy.ndarray'>
I don’t get why exactly you plotted the train loss with the acc. to evaluate the model don’t we plot the train and test loss together ? why we didn’t calculate the test loss
Hey @AmalNammouchi, the loss function is referred to the error rate during the training. By minimizing the loss, we obtain the “best” model. On the other hand, we don’t want our model to “memorize” data in the training set, so we evaluate accuracy or other measure on both training and test set. In this way, we know how the model performs on the dataset it has seen or hasn’t seen.
Great job! Thanks for sharing!
Hi @gold_piggy, thanks for your answer!
params_dict is a <class ‘dict’>, and its content is:
{0: array(0.02212206), 1: array(0.00774004), 2: array(0.0104344), 3: array(0.01183925), 4: array(0.01891711), 5: array(-0.01234741), [...], , 7838: array(0.01554978), 7839: array(0.00644765), 7840: array(0.01050874)}
For reference, the relevant part of the code I’m using is:
from mxnet import autograd, np, npx, gluon
npx.set_np()
def main():
n_feats = 784
K_classes = 10
W = np.random.normal(0, 0.01, (K_classes, n_feats+1))
# learning parameters
alpha = 0.001
num_epochs = 5
# create data
batch_size = 256
train_iter, test_iter = load_data_fashion_mnist(batch_size)
# training
W = training(batch_size, n_feats, num_epochs, alpha, W, train_iter)
## inference
print("Running inference")
inference(W, test_iter)
def load_data_fashion_mnist(batch_size, resize=None):
[...]
def model(X, W):
batch_size = X.shape[0]
n_feats = X.shape[2]*X.shape[3]
X = X.reshape(batch_size, n_feats)
X = np.concatenate( (np.ones((batch_size,1)), X), axis=1)
linear_output = np.dot(W, X.transpose())
return softmax(linear_output)
def softmax(X):
[...]
def cross_entropy(Y_hat, Y):
[...]
def accuracy(Y_hat, Y):
[...]
def training(batch_size, n_feats, num_epochs, alpha, W, train_iter):
params_dict = dict( (j+i*n_feats, W[i][j]) for i in range(len(W)) for j in range(len(W[0])))
for epoch in range(num_epochs):
print(f"Epoch n. {epoch}")
for X_batch, Y_batch in train_iter:
W.attach_grad()
with autograd.record():
Y_hat = model(X_batch, W)
loss = cross_entropy(Y_hat, Y_batch)
loss.backward()
## using SGD -> this works
W = SGD(W, alpha)
## using trainer -> TRYING TO MAKE THIS WORK
# trainer = gluon.Trainer(params_dict, 'sgd', {'learning_rate': 0.03})
# trainer.step(batch_size)
print(f"\tLoss: {loss}")
acc = accuracy(Y_hat, Y_batch)
print(f"\tAcc: {acc}")
return W
def SGD(W, alpha):
[...]
def get_fashion_MNIST_labels(labels):
[...]
def inference(W, test_iter):
[...]
Hi @LewsTherin511, collect_params()
produces a list of named parameters, so the trainer knows which one to gradient descent. In you dict, the parameters are not what the Trainer
function is looking for…
Please check here for more details of collect_params()
: https://d2l.ai/chapter_deep-learning-computation/parameters.html?#collecting-parameters-from-nested-blocks