Accelerating FP16 Inference on Volta

kellen · October 15, 2017, 9:19pm

I know there’s previously been some compile flags that were required to get FP16 acceleration on the Pascal generation chips. Is this still the case with Volta? I’ve recently been testing inference on a FP16 model and I’m not seeing any speedups relative to the same model with FP32 params.

I’ve set USE_CUDA / USE_CUDNN = 1. I haven’t modified the gpu archs / sm flags in the Makefile. I’m building from the tip of master, commit 9f97dac76e43b2ca0acb09a4ff96d416e9edea60.

domdivakaruni · October 17, 2017, 12:06am

check out http://docs.nvidia.com/deeplearning/sdk/pdf/Training-Mixed-Precision-User-Guide.pdf

b0noi · October 17, 2017, 12:11am

There are no special key needed during the compilation of the MXNet. However, there is a key that is needed during the training: --dtype float16

It would be helpful if you can give more information about:

how model is trained
how model is used

bhavinthaker · October 17, 2017, 12:50am

Specifically, see Section 5.3.1. Running FP16 Training on MXNet, to see the compile-time flags and then to verify that MXNet is trained with FP16.

See also:
[1] http://on-demand.gputechconf.com/gtc/2017/presentation/s7218-training-with-mixed-precision-boris-ginsburg.pdf
[2] https://github.com/apache/incubator-mxnet/issues/7996

kellen · October 17, 2017, 5:37pm

Exactly what I’m after, many thanks.

kellen · October 18, 2017, 6:14pm

In my case I’m training NMT models, and I believe the dtype flag applies to some of the computer vision samples. This brings up a good point though, if we expose that flag in sockeye we’ll try and keep it consistent with the compvis samples.

My build steps are documented here: https://github.com/awslabs/sockeye/tree/master/tutorials/wmt
but really I’m just after any compile flags that are required. These compile flags are in the docs linked by dom.

IvyGongoogle · February 15, 2019, 4:25am

but how to use fp16 to infer a batch data by C++ api?

IvyGongoogle · February 15, 2019, 4:25am

but how to use fp16 to infer a batch data by C++ api?

Topic		Replies	Views
Is there a flag to enable tensor cores? No speedup from FP16 Performance	4	696	March 20, 2020
Inference performance with fp16 float16 GTX 1080 ti Performance python , gluon , performance	2	3316	February 21, 2019
Marginal performance improvement with Titan V (volta) + CUDA 9 + CUDNN 7 Performance	3	1730	December 29, 2017
No noticable speed improvement with higher compute capability Performance	6	694	April 16, 2019
Huge performance decrease by quantization Performance	4	991	June 4, 2019

Accelerating FP16 Inference on Volta

Related Topics