I know there’s previously been some compile flags that were required to get FP16 acceleration on the Pascal generation chips. Is this still the case with Volta? I’ve recently been testing inference on a FP16 model and I’m not seeing any speedups relative to the same model with FP32 params.
I’ve set USE_CUDA / USE_CUDNN = 1. I haven’t modified the gpu archs / sm flags in the Makefile. I’m building from the tip of master, commit 9f97dac76e43b2ca0acb09a4ff96d416e9edea60.
In my case I’m training NMT models, and I believe the dtype flag applies to some of the computer vision samples. This brings up a good point though, if we expose that flag in sockeye we’ll try and keep it consistent with the compvis samples.