Custom MCC Loss Function

Hi all,
I’m trying to write a MCC Loss Function. I draft out the layout but it has to be done in an array form (This is not a school HW).

class MCCLoss(Loss):
    def __init__(self, weight=None, batch_axis=0, **kwargs):
        super(MCCLoss, self).__init__(weight, batch_axis, **kwargs)

    @staticmethod
    def compute_confusion_matrix(F, y_true, y_pred):
        K = len(F.unique(y_true))  # Number of classes
        result = F.zeros((K, K))

        for i in range(len(y_true)):
            result[y_true[i]][y_pred[i]] += 1

        return result

    @staticmethod
    def compute_confusion_matrix_values(y_true, y_pred):
        tp = 0
        fp = 0
        tn = 0
        fn = 0

        for i in range(len(y_pred)):
            if y_true[i] == y_pred[i] == 1:
                tp += 1
            if y_pred[i] == 1 and y_true[i] != y_pred[i]:
                fp += 1
            if y_true[i] == y_pred[i] == 0:
                tn += 1
            if y_pred[i] == 0 and y_true[i] != y_pred[i]:
                fn += 1

        return tp, fp, tn, fn

    @staticmethod
    def matthews_corrcoef(F, tp, fp, tn, fn):
        # https://stackoverflow.com/a/56875660/992687
        x = (tp + fp) * (tp + fn) * (tn + fp) * (tn + fn)
        epsilon = np.finfo(np.float64).eps
        return ((tp * tn) - (fp * fn)) / F.sqrt(x + epsilon)

    def hybrid_forward(self, F, y_pred, y_true, sample_weight=None):
        tp, fp, tn, fn = self.compute_confusion_matrix_values(y_true, y_pred)
        loss = 1 - self.matthews_corrcoef(F, tp, fp, tn, fn)
        return loss

I found some resources that are quite useful, especially an example implementation using Keras in the following link [1].

I’m not sure if I can use MakeLoss at Reference [2] to simplify the whole thing.

Reference:
[1] Multiple Classification for MCC (Matthews Correlation Coefficient implementation) in Keras

[2] Custom Loss Function by using MakeLoss
http://beta.mxnet.io/r/api/mx.symbol.MakeLoss.html
https://blog.csdn.net/u013381011/article/details/79141680

[3] MXNet MCC metric


Would any tech expert help me to implement this? Need a good hands

Much appreciated

Hi @hamilton,

I’ve done some research on it (not yet published, and I don’t know if I will ever publish it - depends on the final findings), it assumes the inputs (predictions and labels) are multichannel images of size (Batch, NClasses, Height, Width). I’ve used it for the task of semantic segmentation. This is my implementation. What is different in what you presented from above is the way I evaluate true positive/negative false positive/negative, in a fuzzy context (i.e. where predictions are in the continuous range: p \in [0,1]). Please, inform us on the outcome of your experiments with this loss, when you finish. In my experiments it achieved score <= Tanimoto_with_dual (https://arxiv.org/abs/1904.00592, https://github.com/feevos/resuneta), but with slower convergence. I haven’t done exhaustive search though (I’ve found that sometimes slower convergence may mean better balance between classes). I am currently evaluating it again on a new highly imbalanced problem.

Best of luck


from mxnet.gluon.loss import Loss

class MCCLoss(Loss):

    def __init__(self, smooth = 1.0e-5, batch_axis=0, weight=None, **kwargs):
        super().__init__(batch_axis, weight, **kwargs)

        self.smooth = smooth # I am not actually using it anywhere at the moment 


    def inner_prod(self, F, prob, label):
        prod = F.broadcast_mul(prob,label)
        prod = F.sum(prod,axis=[1,2,3])

        return prod


    def tp(self,F,prob,label):
        return self.inner_prod(F,prob,label)

    def tn(self,F,prob,label):
        return self.inner_prod(F, 1.-prob,1.-label)

    def fp(self,F,prob,label):
        return self.inner_prod(F, prob,1.-label)

    def fn(self,F,prob,label):
        return self.inner_prod(F, 1.-prob, label)


    def hybrid_forward(self, F, prob, label):

        wtp = self.tp(F, prob, label)
        wtn = self.tn(F, prob, label)
        wfp = self.fp(F, prob, label)
        wfn = self.fn(F, prob, label)


        num = F.broadcast_mul(wtp,wtn) - F.broadcast_mul(wfp,wfn)

        denum = wtp + wfp
        denum = F.broadcast_mul(denum, wtp + wfn)
        denum = F.broadcast_mul(denum, wtn + wfp)
        denum = F.broadcast_mul(denum, wtn + wfn)
        denum = F.sqrt(denum)

        return num/denum



Sure, let me try and get back to you. Many thanks

I try to use it, but my output and label are in the form of [1, 0, 1, 0, 0, 0, 2]. And your version seems to be in probability ratio [0.124, 0.456 …]
How do I change it so that I can use it?

Thank you and best regards.

1 Like

The probabilities are continuous, in the range [0,1] (they can admit the values {0,1} as well. The probabilities, and labels (in the function definition above) represent classes in 1hot encoding. Therefore, in order to use the above definition for 2D problems you need to have both probabilities, as well as labels, in 1H encoding.

Assuming you have a 2D image of NClasses = 5 labels, then you can get the corresponding `hot representation via numpy/nd eye function:

NClasses = 5
labels = np.array([[0,1,2],[3,4,5]])
label_1hot = np.eye(NClasses)[labels].transpose([2,0,1]) # the transpose is useful for 2D problems. 

If you google, you’ll find many definitions of 1hot encoding.

Also, for the MCC given above to be used as loss for training a network you need to change sign:

loss = (1. - AboveMCCLoss(preds,labels))*0.5 # loss =1, everything's wrong, loss=0, all correct

Hope this helps.

sample prob: [2 1 2 2 2 0 2 1 1 0 2 1 2 1 2 0 2 1 1 2 1 1 2 1 2 0 2 2 2 2 2 1 1 0 1 0 1
2 2 0 2 1 2 2 0 2 2 1 2 1 2 1 2 1 1 2 0 1 0 0 0 1 1 2 2 2 1 0 2 1 0 1 2 2
1 2 2 2 0 2 2 1 2 0 1 2 1 2 2 2 0 2 1 1 1 1 1 0 1 2 2 1 2 2 1 0 0 0 2 1 0
1 2 0 2 1 0 0 0 0 2 0 1 2 2 1 2 1 2 2 1 2 0 0 0 2 0 1 2 1 1 2 2 0 2 1 1 2
2 1 0 1 0 1 0 1 0 1 0 2 0 1 2 1 1 0 0 1 1 2 2 1 2 2 2 1 0 2 2 1 1 0 1 1 1
2 0 2 0 1 2 1 2 1 1 1 2 0 0 1 1 2 2 1 0 0 1 2 2 2 2 2 1 2 2 1 1 2 2 2 1 2
1 2 1 1 1 2 1 2 2 2 1 2 1 1 1 2 2 1 0 2 2 0 2 1 1 2 0 1 1 2 1 0 2 1 1 1 2
1 0 0 0 1 0 1 1 1 0 1 1 1 2 2 2 0 1 0 2 1 1 0 0 1 1 0 0 2 1 1 1 0 0 2 0 0
2 2 1 1 1 2 2 0 2 1 2 0 1 1 1 1 2 1 1 1 2 2 1 0 2 0 1 2 1 1 2 2 1 0 1 0 1
1 2 1 1 2 2 2 2 1 2 1 2 2 1 0 1 1 1 0 1 1 1 1 2 2 0 2 1 2 1 2 1 0]

sample label: [0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

def hybrid_forward(self, F, prob, label):
    wtp = self.tp(F, prob, label). <-- [10]
    wtn = self.tn(F, prob, label)  <-- [-63]
    wfp = self.fp(F, prob, label). <-- [418]
    wfn = self.fn(F, prob, label). <-- [1]

    num = F.broadcast_mul(wtp, wtn) - F.broadcast_mul(wfp, wfn)

    denum = wtp + wfp.
    denum = F.broadcast_mul(denum, wtp + wfn)
    denum = F.broadcast_mul(denum, wtn + wfp)
    denum = F.broadcast_mul(denum, wtn + wfn)
    denum = F.sqrt(denum)  <-- [-9223372036854775808] 
    loss = (1 - num/denum) * 0.5

    return loss <--[0]

It is the first round of iteration. So it is normal that loss is a small number. But it crashes after execute L.backward() with error Segmentation fault: 11

        with autograd.record():
            output_train = self.net(data)
            predictions = nd.argmax(output_train, axis=1).astype('int64') 
            L = self.loss(predictions, label)

        L.backward() <-- crashed~!
        trainer.step(self.batch_size)

Am I doing something wrong for the input or?

Please advise

Hi Hamilton, I modified the definition for you for 1D problems BUT in 1hot representation. In order to get a loss function for training, you need to subtract from 1 the result of this class (edit : basically you need to change sign, even using -MCCLoss should work).

from mxnet.gluon.loss import Loss
from mxnet import nd 

class MCCLoss(Loss):

    def __init__(self, smooth = 1.0e-5, sum_axis=[1,2], batch_axis=0, weight=None, **kwargs):
        super().__init__(batch_axis, weight, **kwargs)

        self.smooth = smooth # I am not actually using it anywhere at the moment 
        self.sum_axis=sum_axis

    def inner_prod(self, F, prob, label):
        prod = F.broadcast_mul(prob,label)
        prod = F.sum(prod,axis=self.sum_axis)

        return prod


    def tp(self,F,prob,label):
        return self.inner_prod(F,prob,label)

    def tn(self,F,prob,label):
        return self.inner_prod(F, 1.-prob,1.-label)

    def fp(self,F,prob,label):
        return self.inner_prod(F, prob,1.-label)

    def fn(self,F,prob,label):
        return self.inner_prod(F, 1.-prob, label)


    def hybrid_forward(self, F, prob, label):

        wtp = self.tp(F, prob, label)
        wtn = self.tn(F, prob, label)
        wfp = self.fp(F, prob, label)
        wfn = self.fn(F, prob, label)


        num = F.broadcast_mul(wtp,wtn) - F.broadcast_mul(wfp,wfn)

        denum = wtp + wfp
        denum = F.broadcast_mul(denum, wtp + wfn)
        denum = F.broadcast_mul(denum, wtn + wfp)
        denum = F.broadcast_mul(denum, wtn + wfn)
        denum = F.sqrt(denum)

        return num/denum

Assume, you have a problem with 3 classes. We will create a batch of 2 datums, with 3 (edit 5 objects, sorry) objects per batch. that is in total 2*5 = 10 objects, that we split in a batch of 2. Definitions and sanity checks

In [60]: NBatch=2 
    ...: NObjects_per_batch=5 
    ...: NClasses = 3 
    ...: probs_1h = nd.uniform(shape=[NBatch,NObjects_per_batch,NClasses]) 
    ...: probs_1h = nd.softmax(probs_1h,axis=-1) # make the random values probabilities 
    ...: labels = nd.argmax(probs_1h,-1) 
    ...: labels_1h = nd.eye(NClasses)[labels]                                                                                                                                         

In [61]: print(myloss(probs_1h,labels_1h)) # correlation > 0 
    ...: print(myloss(1.-probs_1h, labels_1h)) # anticorrelation <0 
    ...: print(myloss(labels_1h,labels_1h)) # perfect correlation --> 1 
    ...: print(myloss(1-labels_1h,labels_1h)) # should give perfect anticorrelation --> -1 
    ...: print(myloss(labels_1h,1- labels_1h)) # Symmetric same as above      

result:

[0.10435484 0.09912276]
<NDArray 2 @cpu(0)>

[-0.10435482 -0.09912276]
<NDArray 2 @cpu(0)>

[1. 1.]
<NDArray 2 @cpu(0)>

[-1. -1.]
<NDArray 2 @cpu(0)>

[-1. -1.]
<NDArray 2 @cpu(0)>

So if you look at the labels you will see something like:

In [66]: print (labels) 
    ...: print (labels_1h)                                                                                                                                                            

[[2. 2. 1. 2. 1.]
 [2. 1. 2. 2. 1.]]
<NDArray 2x5 @cpu(0)>

[[[0. 0. 1.]
  [0. 0. 1.]
  [0. 1. 0.]
  [0. 0. 1.]
  [0. 1. 0.]]

 [[0. 0. 1.]
  [0. 1. 0.]
  [0. 0. 1.]
  [0. 0. 1.]
  [0. 1. 0.]]]
<NDArray 2x5x3 @cpu(0)>

and the corresponding probabilities:

In [80]: probs_1h                                                                                                                                                                     
Out[80]: 

[[[0.25142097 0.35158828 0.3969908 ]
  [0.21707469 0.36763847 0.4152869 ]
  [0.19482112 0.41429797 0.39088094]
  [0.2548438  0.36045036 0.38470584]
  [0.29028696 0.40323463 0.30647835]]

 [[0.31078342 0.30440956 0.384807  ]
  [0.29724038 0.37873575 0.32402387]
  [0.33249545 0.33164975 0.3358548 ]
  [0.23995323 0.24789381 0.512153  ]
  [0.32526138 0.38552523 0.28921342]]]
<NDArray 2x5x3 @cpu(0)>

Note that the probabilities sum to 1 in the last axis:

In [81]: nd.sum(probs_1h,-1)                                                                                                                                                          
Out[81]: 

[[1.         1.         1.         1.         0.99999994]
 [1.         1.         1.         1.         1.        ]]
<NDArray 2x5 @cpu(0)>

If you want to train a network you need to define:

tloss = MCCLoss()
def training_loss(probs, labels):
    #return 1. - tloss(probs,labels)
    return - tloss(probs,labels) # **edit**

Some comments in what you gave:

probs takes values above 1, therefore they cannot be probabilities, I assume you mixed probs with labels. Also, in a problem where the output needs to more than 1 classes, your network should predict a vector of values, with each entry representing the probabilitiy of the particular class.

In my definitions above, tp is the fuzzy definition of true positive, similarly tn the fuzzy definition of true negative and so on. Therefore it will ALWAYS be tp, tn, fp, fn >= 0, they cannot be negative, they are (fuzzy generalizations of) counts.

So, in 1hot representation, the class 2 (in a set of 3 classes {0,1,2}) is represented by a vector [0,0,1]. The class 0 is [1,0,0], the class 1 is [0,1,0] and so on.

A sample probability for class 2 in 1hot is [0.05,0.05,0.9]

So if you have a network the LAST layer of the network should be gluon.nn.Dense(NClasses) and the activation softmax:


class SomeNet(HybridBlock):
    def __init__(self, ...):


        with self.name_scope():
            self.last_layer = gluon.nn.Dense(NClasses)

    def hybrid_forward(self, F, input):
        x = some_layers(input)
        x = self.last_layer(x)
        x = F.softmax(x,-1) # translate to probabilities
        return x    

And if your data is in the format (NBatch), label = [2,0,1] you need to translate it to 1hot before you pass it into the loss function: label_1h = nd.eye(NClasses)[label]

...
with autograd.record():
    probs = net(input)
    loss = training_loss(probs, labels_1h)
    loss.backward()
...

Hope this helps

Regards,
Foivos

1 Like

Thank you for you reply. I was trying to do something like this.
Reference:
[1] How to Develop 1D Convolutional Neural Network Models for Human Activity.

Question 1:

So my net consists of Conv1D just like in the example and the Conv1D layout is NCW and the last layer is a Dense with output 3. (I’m using his for testing purpose in case I make mistake in my net before clearing other bugs)

net in the Keras example:

model.add(Conv1D(filters=64, kernel_size=3, activation='relu', input_shape=(n_timesteps,n_features)))
	model.add(Conv1D(filters=64, kernel_size=3, activation='relu'))
	model.add(Dropout(0.5))
	model.add(MaxPooling1D(pool_size=2))
	model.add(Flatten())
	model.add(Dense(100, activation='relu'))
	model.add(Dense(n_outputs, activation='softmax'))

Say my net is exactly same the what it is in the example with Conv1D, how do I feed it back to your MCCLoss? Do I just change the following code

def __init__(self, smooth = 1.0e-5, sum_axis=[1,2], batch_axis=0 ...

to

def __init__(self, smooth = 1.0e-5, sum_axis=[1], batch_axis=0 ...

since the output dimension changed from 3 (input passed to Conv1D) from to 2 (input passed to many Conv1D and dense)? Am I understanding this correctly?

Question 2:

I tried this out and it works ok

NClasses = 3
x = nd.random.randn(2, 5, 5)
print('input_data.shape:{0}'.format(x.shape))
conv = nn.Conv1D(layout='NCW', channels=3, kernel_size=3, activation='relu') ## <--- I add this
conv.initialize()
y = conv(x)
probs_1h = nd.softmax(y, axis=-1)
labels = nd.argmax(probs_1h, -1)
labels_1h = nd.eye(NClasses)[labels]
print('conv(input).shape:{0}'.format(y.shape))

print(training_loss(probs_1h, labels_1h))  # correlation > 0
print(training_loss(1.-probs_1h, labels_1h))  # anticorrelation <0
print(training_loss(labels_1h, labels_1h))  # perfect correlation --> 1
print(training_loss(1-labels_1h, labels_1h))  # should give perfect anticorrelation --> -1
print(training_loss(labels_1h, 1 - labels_1h))

Q2-1.

how to do I do something like

conv = nn.Conv1D()
conv = conv + nn.Dense() 

just for simple testing purpose?

Q2-2.

Am I testing it correctly for above code?

Q3: More of your current work for show?

btw, your GitHub work is cool. Anything else like LSTM, RNN or Reinforcement learning?

Thank you and best regards.

I tested the code and they work fine. But there is one thing fishy. Even though the loss function works well by reducing the loss, but after each iteration, I feed the result into the MCC Metric, if I have zero true positive, then I get zero for accuracy. Which seems reasonable too, but can the loss function capture that? or it is not the job of loss function?

In my experiments there is a very strong correlation between the mcc coefficient as defined here and the mcc metric. So the mcc loss (without the minus sign) should be in par with the mcc metric provided by mxnet. If I understood correct your comments.

Apologies I haven’t answered in some of your previous questions, super busy at the moment.

PS From some of my tests on the ISPRS competition data (Potsdam) with this loss for semantic segmentation. First column is MCC as found by mxnet module, 2nd is the one I described here. Each row corresponds to an epoch, starting from epoch 0.

mcc:0.6698733261232598 WMCCLoss:0.6659016993009683 
mcc:0.658770920171034 WMCCLoss:0.6577712220676017  
mcc:0.7055600441885792 WMCCLoss:0.7047146621978644 
mcc:0.7232530915373807 WMCCLoss:0.7214534603285067 
mcc:0.7393539589852347 WMCCLoss:0.7386469845518921 
mcc:0.740400740329363 WMCCLoss:0.7395767506324884  
mcc:0.7453129102142186 WMCCLoss:0.7444800608085863 
mcc:0.747151123747024 WMCCLoss:0.7465483830733732  
mcc:0.7636966561423574 WMCCLoss:0.7631925814079515 
mcc:0.7576392912136097 WMCCLoss:0.7571004489154527 
mcc:0.7631220932769607 WMCCLoss:0.762692519661152  
mcc:0.7710398817884819 WMCCLoss:0.7705972415931297 
mcc:0.7801638966660944 WMCCLoss:0.7797591722372806 
mcc:0.8052483471937576 WMCCLoss:0.8047520319620768 
mcc:0.823695918201104 WMCCLoss:0.8233272631963094  
mcc:0.8265932944507297 WMCCLoss:0.8262284723195162 
mcc:0.8239171220927226 WMCCLoss:0.8235982475858746 
mcc:0.8368393804998939 WMCCLoss:0.8366058414632623 
mcc:0.835746346298212 WMCCLoss:0.8354963840860309  
mcc:0.8397478092546828 WMCCLoss:0.8394777558066628 
mcc:0.8340098905377393 WMCCLoss:0.8337704864415255 
mcc:0.8382163530360991 WMCCLoss:0.8379891063227798 
mcc:0.8423776550563528 WMCCLoss:0.8421464634664131 
mcc:0.84451460422495 WMCCLoss:0.8443077278859688   
mcc:0.8421568202273718 WMCCLoss:0.8419472828055873 
mcc:0.8431891133941976 WMCCLoss:0.8429776592688127 
mcc:0.8351668650345627 WMCCLoss:0.8349867788228121 
mcc:0.8479772236773374 WMCCLoss:0.8477974515972715 
mcc:0.8452902461337576 WMCCLoss:0.8451430075096361 
mcc:0.8449820791288757 WMCCLoss:0.8448272040396025 
mcc:0.8469488079985643 WMCCLoss:0.8467872684652155 
mcc:0.8469957462846359 WMCCLoss:0.846819388143944  
mcc:0.8466896630751054 WMCCLoss:0.846524594408093  
mcc:0.8497674364454311 WMCCLoss:0.8496271353779417 
mcc:0.8509833368220874 WMCCLoss:0.8508423295888033 
mcc:0.8420099309530975 WMCCLoss:0.8418516426375418 
mcc:0.8384215330651769 WMCCLoss:0.8382761803540316 
mcc:0.8408745956863247 WMCCLoss:0.8407282413858356 
mcc:0.8535120992726162 WMCCLoss:0.8533815199678595 
mcc:0.8530902688555522 WMCCLoss:0.8529689998337717 
mcc:0.8497094546581363 WMCCLoss:0.849591795242194  
mcc:0.8518310621757802 WMCCLoss:0.8517187761537957 
mcc:0.8507880226134985 WMCCLoss:0.8506761518391696 
mcc:0.8541973497232096 WMCCLoss:0.85408313888492   
mcc:0.8548362332084927 WMCCLoss:0.8547218965761589 
mcc:0.8526280147890295 WMCCLoss:0.8525073799219999 
mcc:0.8562761368353743 WMCCLoss:0.8561727621338584 
mcc:0.8568617643975673 WMCCLoss:0.8567521120562698 
mcc:0.8585471350439196 WMCCLoss:0.8584484132853422 
mcc:0.853439387455077 WMCCLoss:0.8533255078575828  
mcc:0.856030384842509 WMCCLoss:0.8559133771694067  
mcc:0.8524307643939948 WMCCLoss:0.8523239240501866 
mcc:0.8600365694794491 WMCCLoss:0.8598839875423547 
mcc:0.8619135706197708 WMCCLoss:0.8618148294362155 
mcc:0.8571862438666221 WMCCLoss:0.8571093244986101 
mcc:0.8619619497838656 WMCCLoss:0.8618720498951998 
mcc:0.8578015946348441 WMCCLoss:0.8576851378787648 
mcc:0.8580781010003965 WMCCLoss:0.8579725243828513 
mcc:0.8616069760593774 WMCCLoss:0.8615188472198717 
mcc:0.8636693441072673 WMCCLoss:0.8635824882622921 
mcc:0.8608415079533258 WMCCLoss:0.8607531171856504 
mcc:0.8664733833016075 WMCCLoss:0.8663897333723126 
mcc:0.8560167854293355 WMCCLoss:0.8559448068792169 
mcc:0.8620966023865694 WMCCLoss:0.8620020115014279