Custom symbol loss with gluon

Hi,

I wrote this code, (almost are from tutorial, I just modified a few lines)
and this is not working. :cry:

from mxnet import gluon 
from mxnet.gluon import nn
np.random.seed(42)
mx.random.seed(42)
ctx = mx.gpu()

def data_xform(data):
    """Move channel axis to the beginning, cast to float32, and normalize to [0, 1]."""
    return nd.moveaxis(data, 2, 0).astype('float32') / 255


# prepare data
train_data = mx.gluon.data.vision.MNIST(train=True).transform_first(data_xform)
val_data = mx.gluon.data.vision.MNIST(train=False).transform_first(data_xform)
batch_size = 100
train_loader = mx.gluon.data.DataLoader(train_data, shuffle=True, batch_size=batch_size)
val_loader = mx.gluon.data.DataLoader(val_data, shuffle=False, batch_size=batch_size)


# create network
data = mx.symbol.Variable('data')
fc1 = mx.symbol.FullyConnected(data = data, name='fc1', num_hidden=128)
act1 = mx.symbol.Activation(data = fc1, name='relu1', act_type="relu")
fc2 = mx.symbol.FullyConnected(data = act1, name = 'fc2', num_hidden = 64)
act2 = mx.symbol.Activation(data = fc2, name='relu2', act_type="relu")
fc3 = mx.symbol.FullyConnected(data = act2, name='fc3', num_hidden=10)

net= gluon.SymbolBlock(outputs=[fc3], inputs=[data])
net.initialize(ctx=ctx)


# create trainer, metric
trainer = gluon.Trainer(
    params=net.collect_params(),
    optimizer='sgd',
    optimizer_params={'learning_rate': 0.1, 'momentum':0.9, 'wd':0.00001},
)
metric = mx.metric.Accuracy()


# learn
num_epochs = 10
for epoch in range(num_epochs):
    for inputs, labels in train_loader:
        inputs = inputs.as_in_context(ctx)
        labels = labels.as_in_context(ctx)

        with autograd.record():
            outputs = net(inputs)
            # softmax
            exps = nd.exp(outputs - outputs.min(axis=1).reshape((-1,1)))
            exps = exps / exps.sum(axis=1).reshape((-1,1))
            # cross entropy
            loss = nd.MakeLoss(-nd.log(exps.pick(labels)))
            #
            #loss = gluon.loss.SoftmaxCrossEntropyLoss()(outputs, labels)
            #print(loss)

        loss.backward()
        metric.update(labels, outputs)

        trainer.step(batch_size=inputs.shape[0])

    name, acc = metric.get()
    print('After epoch {}: {} = {}'.format(epoch + 1, name, acc))
    metric.reset()

If I use gluon.loss.SoftmaxCrossEntropyLoss, this runs well…

When I print loss in both cases, output values are look same.

What are the differences? and where should I rewrite?

Thank you for advance.

Hi,

In your question, what do you mean by not working exactly? You said the output values look the same. That means the code isn’t crashing. Do you mean the loss isn’t converging?

Sorry for my poor explanation.

You are right, output looks same.

Main problem is: when using my own loss[1], the loss does not back-propagate.(accuracy is not increasing) But when I use gluon.loss[2], this works well. For inspection of errors of my own code, I printed all loss values in both two cases, and they look same. So I think my calculation works only on forward pass, and I want to know how to make this work on backward pass.

Above code is completely running example without any modification, you can test it if you don’t mind.

[1]

exps = nd.exp(outputs - outputs.min(axis=1).reshape((-1,1)))
exps = exps / exps.sum(axis=1).reshape((-1,1))
loss = nd.MakeLoss(-nd.log(exps.pick(labels)))

[2]

loss = gluon.loss.SoftmaxCrossEntropyLoss()(outputs, labels)

Hi,

Why were you subtracting outputs.min in your exponential call? In the first iteration you get the same results but if you compare after multiple iterations in the first epoch the loss is different from that of SoftmaxCrossEntropyLoss()

autograd records any computation you make in the scope including the loss calculation and knows how to compute the gradients so I don’t think you need to use nd.MakeLoss here.

I modified slightly the code you posted and I’m getting the same accuracy as when I use the gluon loss function.

import numpy as np
import mxnet as mx
from mxnet import gluon, nd, autograd
from mxnet.gluon import nn


np.random.seed(42)
mx.random.seed(42)
ctx = mx.cpu()

def data_xform(data):
    """Move channel axis to the beginning, cast to float32, and normalize to [0, 1]."""
    return nd.moveaxis(data, 2, 0).astype('float32') / 255


# prepare data
train_data = mx.gluon.data.vision.MNIST(train=True).transform_first(data_xform)
val_data = mx.gluon.data.vision.MNIST(train=False).transform_first(data_xform)
batch_size = 100
train_loader = mx.gluon.data.DataLoader(train_data, shuffle=True, batch_size=batch_size)
val_loader = mx.gluon.data.DataLoader(val_data, shuffle=False, batch_size=batch_size)


# create network
data = mx.symbol.Variable('data')
fc1 = mx.symbol.FullyConnected(data = data, name='fc1', num_hidden=128)
act1 = mx.symbol.Activation(data = fc1, name='relu1', act_type="relu")
fc2 = mx.symbol.FullyConnected(data = act1, name = 'fc2', num_hidden = 64)
act2 = mx.symbol.Activation(data = fc2, name='relu2', act_type="relu")
fc3 = mx.symbol.FullyConnected(data = act2, name='fc3', num_hidden=10)

net= gluon.SymbolBlock(outputs=[fc3], inputs=[data])
net.initialize(ctx=ctx)

# create trainer, metric
trainer = gluon.Trainer(
    params=net.collect_params(),
    optimizer='sgd',
    optimizer_params={'learning_rate': 0.1, 'momentum':0.9, 'wd':0.00001},
)
metric = mx.metric.Accuracy()


# learn
num_epochs = 10
for epoch in range(num_epochs):
    for inputs, labels in train_loader:
        inputs = inputs.as_in_context(ctx)
        labels = labels.as_in_context(ctx)

        with autograd.record():
            outputs = net(inputs)
            # softmax
            exps = nd.exp(outputs)
            exps = exps / exps.sum(axis=1).reshape((-1,1))
            # cross entropy
            loss = -nd.log(exps.pick(labels))
            #
            # loss2 = gluon.loss.SoftmaxCrossEntropyLoss()(outputs, labels)
            # print(loss1)
            # print(loss2)
            # assert False

        loss.backward()
        metric.update(labels, outputs)

        trainer.step(batch_size=inputs.shape[0])

    name, acc = metric.get()
    print('After epoch {}: {} = {}'.format(epoch + 1, name, acc))
    metric.reset()
1 Like

It’s my fault, actually min must be a max because the stability of numerical calculation.(refer here)

This problem is solved. Thank you very much.

(And I found one more error,
To make my code work, accuracy should be calculated with exp, not outputs
metric.update(labels, exps)
And as @sad pointed out, nd.MakeLoss is not needed.)