We’ve received a couple NVIDIA Titan V (Volta) cards and experimenting with half precision and without half precision we’re seeing marginal performance improvement with half-precision (dtype = float16) set, also tried with Titan X (Pascal), although we didn’t expect half precision to work on Pascal architecture.
This was tested with release 1.0.0
Running on a machine with CUDA 9.0 + CUDNN 7.0.5
To reproduce, one epoch on resnet for CIFAR10 script:
I got similar result, and found an explanation on the topic:
Here is the explanation:
To enable it, you need to set the datatype parameter to CUDNN_DATA_HALF when calling
cudnnSetConvolutionNdDescriptor or cudnnSetConvolution2dDescriptor_v5
Of course, the input tensor and output tensor need also to be of datatype CUDNN_DATA_HALF
If you call cudnnSetConvolutionNdDescriptor with datatype CUDNN_DATA_FLOAT but the tensor are of
type CUDNN_DATA_HALF, then the input are converted from fp16 → fp32 and the math are done in FP32 > and the output is converted back to FP16
Are you suggesting to just do a straight assignment to dtype to compute_type without the check? Sounds logical. Good catch. I can give it a try and will report back.