I am working on a VQA project and have basically 2 questions now.
First of all I would introduce the dataset, every training question has 3 answers, so I fit the sample into the model like (question, ans1), (question, ans2), (question, ans3)
, So if I use the softmax to predict and I can get one answer at the end, so the accuracy could be at most 0.33
Besides, I use loss = gluon.loss.SoftmaxCrossEntropyLoss()
to be the training loss, and mx.metric.Accuracy()
to be the evaluation, with the update pair as metric.update([label], [output])
, where label
is the training answer and output
is the softmax vector of all possible answers
The training loop is using
cross_entropy = loss(output, label)
cross_entropy.backward()
Here is something really strange, I use just 3 samples to test and I got the accuracy 73% (actually the accuracy could at most be 0.33
in my dataset) after 10 epochs. And to test this issue, I predict the training data with the model, and it gives really strange answer.
Here is my training data
what is in front of the chair,mirror,pool,shelf,
what is the color of the person's clothes in video,blue,dark blue,black blue,
what is the person doing in video,cleaning up,wiping mirror,washing cup,
where is the person in video,indoor,washroom,residence,
is the person sitting or standing in the video,standing,standing,standing
And my predicting result is (each training question has 3 answers, and I just predict the one with the maximum softmax value)
what is in front of the chair,shelf,
what is the color of the person's clothes in video,cleaning up,
what is the person doing in video,washroom,
where is the person in video,kissing,
is the person sitting or standing in the video,light white
I use np.argmax
to get the answer from the softmax layer. And I print the softmax result, the first 3 lines of it is
answer is shelf with softmax [15.491705] <NDArray 1 @cpu(0)>
answer is cleaning up with softmax [8.109538] <NDArray 1 @cpu(0)>
answer is washroom with softmax [8.194625] <NDArray 1 @cpu(0)>
answer is kissing with softmax [7.8190136] <NDArray 1 @cpu(0)>
answer is light white with softmax [6.411439] <NDArray 1 @cpu(0)>
So my 2 questions, 1) Obviously the accuracy is not as high as 73%, so how do the function metric.update()
evaluate the accuracy, 2) How could the softmax value be over 1 or be negative number, isn’t it normalized? The official Accuracy
evaluation says that ‘‘Prediction values for samples. Each prediction value can either be the class index, or a vector of likelihoods for all classes.’’ according to https://mxnet.apache.org/api/python/metric/metric.html, and it just consider the class with the maximum likelihood. How could it be if the likelihoods is above 1???
I know it is bothering to deal with so many things, so if anyone could explain the bold type question first, and maybe I can debug the code from it, thank you!