SigmoidBCELoss is typically used in multilabel classification (where a single example can belong to multiple classes). When I train a model in this way using Gluon, the network doesn’t appear to learn, as can be seen from both training loss and training accuracy (e.g., the training accuracy is stuck at 53% from first epoch). An identical model trained in Keras/Tensorflow shows a gradually increasing training accuracy (and decreasing training loss) eventually reaching 100% training accuracy in about 200 epochs. What am I doing wrong here?
Here are both the Gluon and Keras examples to reproduce the problem I’m experiencing:
# MXNet Gluon: doesn't appear to work
import mxnet as mx
from mxnet import gluon, nd
from mxnet.gluon import nn
import numpy as np
X = [[1,0,0,0,0,0,0],
[1,2,0,0,0,0,0],
[3,0,0,0,0,0,0],
[3,4,0,0,0,0,0],
[2,0,0,0,0,0,0],
[3,0,0,0,0,0,0],
[4,0,0,0,0,0,0],
[2,3,0,0,0,0,0],
[1,2,3,0,0,0,0],
[1,2,3,4,0,0,0],
[0,0,0,0,0,0,0],
[1,1,2,3,0,0,0],
[2,3,3,4,0,0,0],
[4,4,1,1,2,0,0],
[1,2,3,3,3,3,3],
[2,4,2,4,2,0,0],
[1,3,3,3,0,0,0],
[4,4,0,0,0,0,0],
[3,3,0,0,0,0,0],
[1,1,4,0,0,0,0]]
Y = [[1,0,0,0],
[1,1,0,0],
[0,0,1,0],
[0,0,1,1],
[0,1,0,0],
[0,0,1,0],
[0,0,0,1],
[0,1,1,0],
[1,1,1,0],
[1,1,1,1],
[0,0,0,0],
[1,1,1,0],
[0,1,1,1],
[1,1,0,1],
[1,1,1,0],
[0,1,0,0],
[1,0,1,0],
[0,0,0,1],
[0,0,1,0],
[1,0,0,1]]
loader = gluon.data.DataLoader(gluon.data.SimpleDataset(list(zip(X,Y))), batch_size=1)
MAXLEN = 7
MAXFEATURES = 4
ctx = mx.gpu(0)
NUM_CLASSES=4
# model
net=gluon.nn.HybridSequential()
with net.name_scope():
net.add(nn.Embedding(MAXFEATURES+1, 50))
net.add(nn.GlobalAvgPool1D())
net.add(nn.Dense(NUM_CLASSES, activation='sigmoid'))
net.hybridize()
net.collect_params().initialize(mx.init.Xavier(), ctx=ctx)
# trainer
trainer = gluon.Trainer(
params=net.collect_params(),
optimizer='adam',
optimizer_params={'learning_rate': 0.001},
)
metric = mx.metric.Accuracy()
loss_function = gluon.loss.SigmoidBCELoss(from_sigmoid=True)
for epoch in range(200):
for (data, labels) in loader:
data = data.as_in_context(ctx)
labels = labels.astype('float32').as_in_context(ctx)
with mx.autograd.record():
outputs = net(data)
loss = loss_function(outputs, labels)
loss.backward()
metric.update(labels, outputs)
trainer.step(batch_size=data.shape[0])
name, acc = metric.get()
if epoch %20 == 0: print("epoch %s: %s" % (epoch, acc))
metric.reset()
The training accuracy of this model is stuck at 53% from first epoch and predictions on training examples after training are completely off (so it’s not just an issue with the EvalMetric).
Implementing the above model in Keras/TensorFlow, as shown below, seems to train correctly as expected.
# Keras: works and achieves 100% training accuracy
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Embedding
from keras.layers import GlobalAveragePooling1D
import numpy as np
X = np.array(X)
Y = np.array(Y)
MAXLEN = 7
MAXFEATURES = 4
NUM_CLASSES = 4
model = Sequential()
model.add(Embedding(MAXFEATURES+1,
50,
input_length=MAXLEN))
model.add(GlobalAveragePooling1D())
model.add(Dense(NUM_CLASSES, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.fit(X, Y,
batch_size=1,
epochs=200,
validation_data=(X, Y))
The Keras/Tensorflow predictions on training examples are near-perfect, but the Gluon predictions are way off.
Gluon prediction is not accurate (first position and fourth position in output array should be near 1 and others near zero):
Keras/Tensorflow model outputs the following for the same input, which is correct (first position and fourth position in output array near 1 and above 0.5 with remaining classes near zero):
array([[0.8340912 , 0.01359105, 0.01566718, 0.94641566]], dtype=float32)
What is the proper way to translate the Keras/TensorFlow multilabel model to MXNet Gluon?