Data format for learning on videos

olivcruche · February 21, 2019, 7:13pm

Hi I see here https://github.com/jay1204/st-resnet/blob/master/model/video_iter.py an MXNet video classification model (st-resnet) where the read_train_frames function reads from folders of pictures (frames_name = [f for f in sorted(os.listdir(video_path)) if f.endswith('.jpg')]). I have the following question:

Is is a correct practice for video classification tasks to pre-process the data by converting videos to downsampled jpg frames? Or is the most frequent approach to have the dataset handling reading, downsampling and extracting arrays directly from .mp4 or some other video format?

ThomasDelteil · February 21, 2019, 8:48pm

Anything that works for you @olivcruche
If you expand every frame from a video to a picture, that is going to end up using a lot of disk space. However if you want to perform video classification by stacking the frames and performing for example 3D convolutions, then you’ll have very little overhead during training time.

If you chose to create your own Dataset that handles file reading, decoding and sampling, it is likely that you are going to end up using more CPU cycles for that, but disk space will be preserved. So if you can afford it, CPU wise, I would suggest to do decoding on the fly, so that you can play around with your decoding parameters without having to re-run the entire offline extraction.

Topic		Replies	Views
Video classification - transfer learning Gluon	2	410	February 21, 2020
Image list file format reqd for Object Detection Dataset	0	666	May 11, 2018
Train any image classification network on custom dataset Discussion	2	475	May 30, 2020
Recordio for image sequences? Performance	1	467	August 22, 2019
Classifying Images into 11K classes with pretrained model	2	1613	April 1, 2018

Data format for learning on videos

Related Topics