Id like to point out a few things:
 In section 2.2.11. the 4th property of norms is an “if only if” statement (page 10 of Zico Kolter’s [Linear Algebra Review and Reference). As a consequence, the statement “It is possible to define a norm that gives zero norm to nonzero matrices” isn’t true.
 I would add a small discussion about the difference between a vector and a 1 x n or (n x 1) matrix.
 In section 2.2.8, you mention “When two vectors each have length one …”. I think it would be better to use “norm” instead on “length”, because otherwise it might confuse readers with the definition of “length” defined in section 2.2.3.
 In section 2.2.13.2, when talking about symmetric matrices, you mention “the entries below and above the diagonal are the same”, which is rather vague. It would be nice to see mentioned that symmetric matrices are square matrices.
I hope these suggestions are helpful.
Thanks! I’ve revised these based on your suggestions.
First of all thank you for publishing this, I think it is a great material and a lot of work you are devoting to do this, I really appreciate it.
IMHO I think it is confusing the part of the orientation of the data points in a matrix. I may be missing something but you say that in a matrix that represents a tabular dataset, it is more conventional to treat each data point as a row vector in the matrix. and also you say that along the outermost axis of an ndarray
, we can access or enumerate minibatches of data points which for me is the opposite (I consider the outermost axis as the last axis), you can select batches of columns not rows, isn’t it?
for example, image data (as in softmax regression and CNN chapters), its shape is usually (N, C, H, W), where N stands for number of images, C (channel), H (height), W (width). If you want to select a minibatch (say 11:16), you can specify X[11:16, :, :, :]. That the outermost axis is for minibatches (N) is conventional here.
Ok, so then my misconception comes from the fact of considering “Outermost” axis as the last one (columns in a matrix, or depth in a rank 3 tensor for example) and the point is that it is the first one, is this correct?
Thank you very much!
Hi @gpolo, the outermost axis in a matrix will be a row. You can think by calculating the sequence of the bracket in a matrix. For a 2D matrix, it will be look like
[[…] , […] , […] , …]
You can easily find the commas (as highlighted above) who separates items within the most outside brackets, and the number of items here will be the outermost axis.
Hi @gold_piggy,
I am having problems with “sort” methods of np package of mxnet.
I mean, you can see that for example the ordinary numpy package have a this methods:

numpy.sort(a, axis =1, kind=None , order=None ): This is like a package function

ndarray.sort(axis=1, kind=None , order=None): This is instance method

numpy.argsort(a, axis=1 , kind=None, order=None): This is a package function
I can’t find any of these on the mxnet.numpy, it raises NotImplementedError, is this correct?
Does anybody have the same problem? Without being defined in mxnet.numpy you cannot used them in an autograd.record() context (neither the original numpy sort or argosrt functions)
Thank you very much
I have the same problem.
Maybe you can try to get around it by using a = np.array(sorted(a))
Hi @gpolo and @vermicelli , sorry for the late reply.
Please check the document here. http://numpy.mxnet.io/api/deepnumpy/generated/mxnet.np.ndarray.sort.html?highlight=sort#mxnet.np.ndarray.sort