Convolutional Neural Networks (LeNet)

mli · November 27, 2018, 11:39pm

http://d2l.ai/chapter_convolutional-neural-networks/lenet.html

martinkersner · February 3, 2019, 11:43am

Hello!

First of all, thank you for a great learning material!

In the chapter about LeNet architecture you mention that your implementation matches the historical definition of Lenet5 (Gradient-Based Learning Applied to Document Recognition) except the last layer, but I found two other inconsistencies in subsection B. LeNet-5.

LeNet paper does not describe pooling layer as an average pooling layer, but rather as layer that perform summation over 2x2 neighborhood within input activation feature map, then multiply it with trainable weight, add trainable bias and finally pass it through sigmoidal function.
According to LeNet paper, the activation function used at both convolution and fully connected layers is scaled hyperbolic tangent function, not sigmoid as is used in code. These two functions looks similar but have different output range (http://m.wolframalpha.com/input/?i=tanh(a)%2C+sigmoid(a))

If there is something I missed and your implementation of LeNet5 is correct, please let me know.

Cheers,
Martin

gold_piggy · February 12, 2019, 9:19pm

Hey Martin,

Pooling was called sub-sampling in the original paper. According to the pg6 on the paper

"This can be achieved
with a socalled subsampling layers which performs a local
averaging and a subsampling reducing the resolution of
the feature map and reducing the sensitivity of the output
to shifts and distortions"

Also, for tanh vs sigmoid, it seems that tanh converges faster than sigmoid (especially useful in 20 years ago when compute power is not strong enough).

Hopefully it helps!
Rachel

Cveinnt · March 19, 2019, 8:02pm

Just want to point out that the link to the Multilayer Perceptron is no longer available for this page in the book.
http://www.d2l.ai/chapter_deep-learning-basics/mlp-scratch.md

gold_piggy · July 6, 2019, 12:05am

Thanks. Please refer to http://d2l.ai/chapter_multilayer-perceptrons/mlp-scratch.html

AgentDS · July 19, 2019, 9:58pm

Hi,

Thanks for the learning material!

But I have some problem when using the code.

# Save to the d2l package.
def train_ch5(net, train_iter, test_iter, num_epochs, lr, ctx=d2l.try_gpu()):
    net.initialize(force_reinit=True, ctx=ctx, init=init.Xavier())
    loss = gluon.loss.SoftmaxCrossEntropyLoss()
    trainer = gluon.Trainer(net.collect_params(),
                            'sgd', {'learning_rate': lr})
    animator = d2l.Animator(xlabel='epoch', xlim=[0,num_epochs],
                            legend=['train loss','train acc','test acc'])
    timer = d2l.Timer()
    for epoch in range(num_epochs):
        metric = d2l.Accumulator(3)  # train_loss, train_acc, num_examples
        for i, (X, y) in enumerate(train_iter):
            timer.start()
            # Here is the only difference compared to train_epoch_ch3
            X, y = X.as_in_context(ctx), y.as_in_context(ctx)
            with autograd.record():
                y_hat = net(X)
                l = loss(y_hat, y)
            l.backward()
            trainer.step(X.shape[0])
            metric.add(l.sum().asscalar(), d2l.accuracy(y_hat, y), X.shape[0])
            timer.stop()
            train_loss, train_acc = metric[0]/metric[2], metric[1]/metric[2]
            if (i+1) % 50 == 0:
                animator.add(epoch + i/len(train_iter),
                             (train_loss, train_acc, None))
        test_acc = evaluate_accuracy_gpu(net, test_iter)
        animator.add(epoch+1, (None, None, test_acc))
    print('loss %.3f, train acc %.3f, test acc %.3f' % (
        train_loss, train_acc, test_acc))
    print('%.1f exampes/sec on %s'%(metric[2]*num_epochs/timer.sum(), ctx))

when I try to run
train_ch5(net, train_iter, test_iter, num_epochs, lr)
there is always the traceback

Traceback (most recent call last):
  File "lenet.py", line 63, in <module>
    train_ch5(net, train_iter, test_iter, num_epochs, lr)
  File "lenet.py", line 50, in train_ch5
    metric.add(l.sum().asscalar(), d2l.accuracy(y_hat, y), X.shape[0])
TypeError: add() takes 2 positional arguments but 4 were given

But since the code has already use metric = d2l.Accumulator(3), how could it happen that add() only takes 2 arguements?

gold_piggy · July 24, 2019, 1:24am

I just rerun it and there was no error. This issue might cause by the new version of MXNet operators. Did you install the numpy version of MXNet? If not, please refer to http://numpy.d2l.ai/chapter_install/install.html

Daeshik_Choi · August 18, 2019, 1:41am

In the implementation of the function evaluate_accuracy_gpu, can we replace
ctx = list(net.collect_params().values())[0].list_ctx()[0]
simply by
ctx = net[0].weight.list_ctx()[0] ?

vermicelli · February 10, 2020, 6:46am

@gold_piggy @mli
I think there is an error in the description about the output shape of 1st conv layer.
In the end of section 6.1.1,

The convolutional layer uses a kernel with a height and width of 5, which with only 2 pixels of padding in the first convolutional layer and none in the second convolutional layer leads to reductions in both height and width by 2 and 4 pixels, respectively.

the 1st conv layer actually has 2 pixel padding on both side of input so I think there is no reduction on the 1st conv output (28 x 28).

ec_2013 · April 29, 2020, 7:01am

Why does so much of this rely on the d2l package? This serves to make the lessons far less general.

goldenpanda · May 1, 2020, 12:25pm

Hi everyone!

First of all I’m really glad this forum exists! This is my first post and I’m looking forward to learn

In this architecture, we got an output channel of 10.
I’ve checked the MNIST dataset and this would fit well to the total of classes:

0 T-shirt/top
1 Trouser
2 Pullover
3 Dress
4 Coat
5 Sandal
6 Shirt
7 Sneaker
8 Bag
9 Ankle boot

As one of the tips in the exercises, one should try to increase the output channel to more than that which I did. As a result, I have actually way better results than before. But what is actually happening here?

When I have 10 classes, each output channel is linked to one class – correct?
But If I have 20 classes, would that mean that each class can have an arbitrary number of output channel (>1)?

Thank you and I hope my question doesn’t sound too dumb

Topic		Replies	Views
CNN and invariance to feature translation on the image Discussion	1	625	September 26, 2018
Dense layer shape in LeNet D2L Book	0	286	May 1, 2020
Image classification example accuracy issue Discussion	3	584	September 9, 2019
Change fast rcnn's backbone from resnet to densenet, model cannot converage Discussion	1	718	August 15, 2018
Networks Using Blocks (VGG) D2L Book	2	853	February 12, 2020

Convolutional Neural Networks (LeNet)

Related Topics