Train speed is weird!

When I trained a face recognition model using resnet100 as base network in nvidia P40, the training speed can reach 400 samples per second, however, when I using mobieFaceNet to train the model, the speed only can reach 100 samples per second, it is so weird because mobieFaceNet ’ network structure is much lighter than resnet 100. Anyone has some ideas? Besides, the code and training environment is the same.

bare in mind that network’s theoretical FLOP does not 100% reflect to real throughput.
Resnet is the most optimized network, but I have no comment on the mobieFaceNet.