I haven’t been able to find thorough documentation on exactly how the performance tests are run (controlled by MXNET_CUDNN_AUTOTUNE_DEFAULT=1).
If I am passing 3 different shapes through a network, will they be run three times? Will it only run once per input shape if running inference on thousands of inputs?
Here is the relevant part of the code:
src/operator/nn/cudnn/cudnn_algoreg-inl.h:89
ParamKey key{param, in_shape[0], in_shape[1], out_shape[0], cudnn_data_type,
cudnn_forward_compute_type, cudnn_backward_compute_type, sm_arch, add_to_weight};
auto i = reg_.find(key);
if (i != reg_.end()) {
*fwd = i->second.fwd;
*bwd = i->second.bwd;
*flt = i->second.flt;
} else {
... (find best algo)
As you can see is run once per unique key
2 Likes