Training speed in MXNet is nearly 2.5x times slower than Pytorch

Hi, Thank you very much for your response.
Honestly I dont think “mixup” could have such a huge impact as to decrease the performance nearly 3 times! but I pretty much may be wrong.
My comparison concerning the mxnet vx pytorch performance, was solely based on training on imagenet with the same procedures except the mixup( actually I just noticed it!). and yes, with the same batch-size, data-augmentation etc.
As you can see, I neither have a considerable load on my CPU nor GPU, although GPU says 100% under-load, but the GPU temp and fan speed say a different story! I’m genuinely puzzled here!
simpnet definition is given here