Multilayer Perceptron

In (4.1.5), W_1 * X should be X * W_1, W_2 * H_1 should be H_1 *W_2, and W_3 * H_2 should be H_2 * W_3.



Agreed, if they stick to the convention of one input sample per row I think this is the only way.
In fact in the previous chapter the Vectorization for Minibatches of the softmax regression section you can see that the linear computation for the softmax is XW + b even though there is another mistake in the explanation where again it is shown WX.

Has anyone figured out exercise 4 or 5?

Can someone please explain the \mathcal{O}(n^{-\frac{1}{2}}) ? Validation Dataset

The uncertainty in our estimates can be shown to be of the order of \mathcal{O}(n^{-\frac{1}{2}}).