Mxnet with cudnn7.1 is little slower than cudnn5.1

After I update the cudnn version, I found that the mxnet using cudnn7.1 is litter slower than cudnn51 when predicting.
I would like to know if someone have the same conclusion. And why?

Ps. My environment is p40, gtx1080ti.

Hi @sanyuan,

Would you be able to create a reproducible script for this to help with testing and diagnosis?

Are there any specific operators that give you the reduction in speed? You could try using the MXNet profiling tools to isolate the difference? Check out this tutorial for more information on how this can be done.