What triggers performance tests for best convolution?

I haven’t been able to find thorough documentation on exactly how the performance tests are run (controlled by MXNET_CUDNN_AUTOTUNE_DEFAULT=1).

If I am passing 3 different shapes through a network, will they be run three times? Will it only run once per input shape if running inference on thousands of inputs?

Here is the relevant part of the code:

    ParamKey key{param, in_shape[0], in_shape[1], out_shape[0], cudnn_data_type,
                 cudnn_forward_compute_type, cudnn_backward_compute_type, sm_arch, add_to_weight};
    auto i = reg_.find(key);
    if (i != reg_.end()) {
      *fwd = i->second.fwd;
      *bwd = i->second.bwd;
      *flt = i->second.flt;
    } else {
    ... (find best algo)

As you can see is run once per unique key