Performance regression in 1.4

Hi all,

I have two containers:

  1. One running Python 2 and MXNet 1.1
  2. An updated container running Python 3 and MXNet 1.4

I have observed some significant performance regressions in the py3-MXNet 1.4.1 container, which is built with MKLDNN enabled.

I am using code at this repo as a ‘minimal reproducible example’:

I used to profiler in each version to capture the second training batch for both containers, in a manner like this:

        i = 0
        for batch in train_iter:
            start_time = time.time()

            if i==1:

            module.forward(batch, is_train=True) 
            if i==1:

This is the profiler output when sorted by total op time for the py2-1.1 container:

Same for the Py3-1.4.1 container: (uploading as a reply due to new-user restriction)

Some ops like backward_Convolution are significantly slower. My machine CPU is a 6-core Intel i7.

Does anyone know if this operator specific, or know a method to determine if it is? Is this issue related to MKL-DNN somehow?

Other context:

Due to how our code is currently structured in my org, it’s quite difficult to upgrade to 1.5+.
When I run the same example with the same containers on a machine with an Intel Xeon CPU (c5 instance on AWS), the opposite occurs: the py3-1.4.1 container is much faster per batch than the py2-1.1 container.

Py3 MXNet-1.4.1 run profiler results