Hi Hamilton, I modified the definition for you for 1D problems BUT in 1hot representation. In order to get a loss function for training, you need to subtract from 1 the result of this class (edit : basically you need to change sign, even using -MCCLoss should work).
from mxnet.gluon.loss import Loss
from mxnet import nd
class MCCLoss(Loss):
def __init__(self, smooth = 1.0e-5, sum_axis=[1,2], batch_axis=0, weight=None, **kwargs):
super().__init__(batch_axis, weight, **kwargs)
self.smooth = smooth # I am not actually using it anywhere at the moment
self.sum_axis=sum_axis
def inner_prod(self, F, prob, label):
prod = F.broadcast_mul(prob,label)
prod = F.sum(prod,axis=self.sum_axis)
return prod
def tp(self,F,prob,label):
return self.inner_prod(F,prob,label)
def tn(self,F,prob,label):
return self.inner_prod(F, 1.-prob,1.-label)
def fp(self,F,prob,label):
return self.inner_prod(F, prob,1.-label)
def fn(self,F,prob,label):
return self.inner_prod(F, 1.-prob, label)
def hybrid_forward(self, F, prob, label):
wtp = self.tp(F, prob, label)
wtn = self.tn(F, prob, label)
wfp = self.fp(F, prob, label)
wfn = self.fn(F, prob, label)
num = F.broadcast_mul(wtp,wtn) - F.broadcast_mul(wfp,wfn)
denum = wtp + wfp
denum = F.broadcast_mul(denum, wtp + wfn)
denum = F.broadcast_mul(denum, wtn + wfp)
denum = F.broadcast_mul(denum, wtn + wfn)
denum = F.sqrt(denum)
return num/denum
Assume, you have a problem with 3 classes. We will create a batch of 2 datums, with 3 (edit 5 objects, sorry) objects per batch. that is in total 2*5 = 10 objects, that we split in a batch of 2. Definitions and sanity checks
In [60]: NBatch=2
...: NObjects_per_batch=5
...: NClasses = 3
...: probs_1h = nd.uniform(shape=[NBatch,NObjects_per_batch,NClasses])
...: probs_1h = nd.softmax(probs_1h,axis=-1) # make the random values probabilities
...: labels = nd.argmax(probs_1h,-1)
...: labels_1h = nd.eye(NClasses)[labels]
In [61]: print(myloss(probs_1h,labels_1h)) # correlation > 0
...: print(myloss(1.-probs_1h, labels_1h)) # anticorrelation <0
...: print(myloss(labels_1h,labels_1h)) # perfect correlation --> 1
...: print(myloss(1-labels_1h,labels_1h)) # should give perfect anticorrelation --> -1
...: print(myloss(labels_1h,1- labels_1h)) # Symmetric same as above
result:
[0.10435484 0.09912276]
<NDArray 2 @cpu(0)>
[-0.10435482 -0.09912276]
<NDArray 2 @cpu(0)>
[1. 1.]
<NDArray 2 @cpu(0)>
[-1. -1.]
<NDArray 2 @cpu(0)>
[-1. -1.]
<NDArray 2 @cpu(0)>
So if you look at the labels you will see something like:
In [66]: print (labels)
...: print (labels_1h)
[[2. 2. 1. 2. 1.]
[2. 1. 2. 2. 1.]]
<NDArray 2x5 @cpu(0)>
[[[0. 0. 1.]
[0. 0. 1.]
[0. 1. 0.]
[0. 0. 1.]
[0. 1. 0.]]
[[0. 0. 1.]
[0. 1. 0.]
[0. 0. 1.]
[0. 0. 1.]
[0. 1. 0.]]]
<NDArray 2x5x3 @cpu(0)>
and the corresponding probabilities:
In [80]: probs_1h
Out[80]:
[[[0.25142097 0.35158828 0.3969908 ]
[0.21707469 0.36763847 0.4152869 ]
[0.19482112 0.41429797 0.39088094]
[0.2548438 0.36045036 0.38470584]
[0.29028696 0.40323463 0.30647835]]
[[0.31078342 0.30440956 0.384807 ]
[0.29724038 0.37873575 0.32402387]
[0.33249545 0.33164975 0.3358548 ]
[0.23995323 0.24789381 0.512153 ]
[0.32526138 0.38552523 0.28921342]]]
<NDArray 2x5x3 @cpu(0)>
Note that the probabilities sum to 1 in the last axis:
In [81]: nd.sum(probs_1h,-1)
Out[81]:
[[1. 1. 1. 1. 0.99999994]
[1. 1. 1. 1. 1. ]]
<NDArray 2x5 @cpu(0)>
If you want to train a network you need to define:
tloss = MCCLoss()
def training_loss(probs, labels):
#return 1. - tloss(probs,labels)
return - tloss(probs,labels) # **edit**
Some comments in what you gave:
probs takes values above 1, therefore they cannot be probabilities, I assume you mixed probs with labels. Also, in a problem where the output needs to more than 1 classes, your network should predict a vector of values, with each entry representing the probabilitiy of the particular class.
In my definitions above, tp is the fuzzy definition of true positive, similarly tn the fuzzy definition of true negative and so on. Therefore it will ALWAYS be tp, tn, fp, fn >= 0, they cannot be negative, they are (fuzzy generalizations of) counts.
So, in 1hot representation, the class 2 (in a set of 3 classes {0,1,2}) is represented by a vector [0,0,1]. The class 0 is [1,0,0], the class 1 is [0,1,0] and so on.
A sample probability for class 2 in 1hot is [0.05,0.05,0.9]
So if you have a network the LAST layer of the network should be gluon.nn.Dense(NClasses)
and the activation softmax:
class SomeNet(HybridBlock):
def __init__(self, ...):
with self.name_scope():
self.last_layer = gluon.nn.Dense(NClasses)
def hybrid_forward(self, F, input):
x = some_layers(input)
x = self.last_layer(x)
x = F.softmax(x,-1) # translate to probabilities
return x
And if your data is in the format (NBatch), label = [2,0,1] you need to translate it to 1hot before you pass it into the loss function: label_1h = nd.eye(NClasses)[label]
...
with autograd.record():
probs = net(input)
loss = training_loss(probs, labels_1h)
loss.backward()
...
Hope this helps
Regards,
Foivos