Is anyone using dilated deconvolution?

I’m wondering if anyone is using dilated deconvolution and what’s the meaning and definition of it.

mxnet.symbol.Deconvolution(data=None, weight=None, bias=None, kernel=_Null, stride=_Null, dilate=_Null, pad=_Null, adj=_Null, target_shape=_Null, num_filter=_Null, num_group=_Null, workspace=_Null, no_bias=_Null, cudnn_tune=_Null, cudnn_off=_Null, layout=_Null, name=None, attr=None, out=None, **kwargs)

Hi @TaoLv,

Can’t say I’ve ever used a dilated deconvolution, but the idea is the same as with a dilated convolution.

Starting with an example of a dilated convolution with a kernel size of 3x3, same padding, a dilation factor of 2, and no stride (i.e. stride of 1x1), we see that the dilation adds gaps to where the kernel is applied on the input matrix. Same color cells are multiplied, and then totals summed to give value in black.

One way of thinking about transpose convolutions is that they just changes the order in which the weights of the kernel are applied. Instead of the top left pixel being multiplied by the top left weight in the kernel, it is the bottom right. See the dark blue cell for an example of this.


And now when we add dilation to a transpose convolution, we do just the same as with the standard convolution, we add gaps to where the ‘transposed’ kernel is being applied to the input matrix. I say ‘transposed’ kernel, but this isn’t technically a transpose operation being applied to the kernel weights!


All of the above, with transpose convolutions would be a little pointless though, because we could have just used a standard convolution for the same effect. Instead of applying the opposite kernel weights, we could have just leant the same weights in the normal kernel positions! So transpose convolutions are useful when we used stride. With stride we get the upsampling effect, to enlarge the output matrix. So below I show an example of a transpose convolution with stride.


And to answer your question, we can now see the effect of applying dilation to a transpose convolution with stride.

I hope that helps. Cheers,



Awesome !! Realy appreciate your greate answer. One minor question, is transpose convolution here the same thing of deconvolution in mxnet? Seems in pytorch, they are different. Below is the doc of ConvTranspose2D in pytorch:

This module can be seen as the gradient of Conv2d with respect to its input. It is also known as a fractionally-strided convolution or a deconvolution (although it is not an actual deconvolution operation).

Yes, it looks like they are the same. I can see “transposed convolution” mentioned in the docstring of mxnet.symbol.Deconvolution which is different from mathematical deconvolution found here. Some confusing terminology out there!

Computes 1D or 2D transposed convolution (aka fractionally strided convolution) of the input tensor.

And mxnet.gluon.nn.Conv2DTranspose does a “transposed convolution” too.

I see. Thanks for your explaination ! @thomelane

How do you think this?
I set dilation 2 and stride 2, mxnet result is different with caffe result

@wxcstc I think this is a bug! Good catch. Spotted this too when running through the maths and comparing with the output of ConvTranspose2d. I raised the issue on GitHub here.

What’s your use case for using stride and dilation? Or were you just testing different combinations to learn?

I just learn GAN recently, not understand the behavior of deconvolution clearly.Just testing different combinations to understand the mathematical process.:thinking:

For upsampling, you can get the equivalent operation by combination of operators resize + convolution. Usually this does not suffer from the checkerboard artefact. It is a little more expensive than transposed convolution, but in my (limited) experiments for semantic segmentation problems, behaves better.