Text and code suggest that the middle column of Graphic 7.2.1 should have a 2x2 MaxPool instead of a 3x3 MaxPool.
@mli, I think there is a typo in 7.2.1 as it reads:
“ The basic building block of classic convolutional networks is a sequence of the following layers: (i) a convolutional layer (with padding to maintain the resolution), (ii) a nonlinearity such as a ReLU, One VGG block consists of a sequence of convolutional layers, followed by a max pooling layer for spatial downsampling.”
I believe it is missing something along this line: “(iii) a max pooling layer for spatial downsampling.” before “One VGG block …”. Or, even better, the two sentences should be merged as they have similar meanings and they are redundant.