Tutorial Fine-tuning SOTA video models on your own dataset - Error

I tried to follow the tutorial using my own dataset (NTU RGDD) dataset with pretrained vgg16_ucf101, I loaded a videos directly but I am facing this error:

mxnet.base.MXNetError: [13:55:37] C:\Jenkins\workspace\mxnet-tag\mxnet\src\operator\nn\convolution.cc:152: Check failed: dshp.ndim() == 4U (5 vs. 4) : Input data should be 4D in batch-num_filter-y-x

please advice