Skip to content

Latest commit

 

History

History
103 lines (73 loc) · 13.3 KB

README.md

File metadata and controls

103 lines (73 loc) · 13.3 KB

TSM

TSM: Temporal Shift Module for Efficient Video Understanding

Abstract

The explosive growth in video streaming gives rise to challenges on performing video understanding at high accuracy and low computation cost. Conventional 2D CNNs are computationally cheap but cannot capture temporal relationships; 3D CNN based methods can achieve good performance but are computationally intensive, making it expensive to deploy. In this paper, we propose a generic and effective Temporal Shift Module (TSM) that enjoys both high efficiency and high performance. Specifically, it can achieve the performance of 3D CNN but maintain 2D CNN's complexity. TSM shifts part of the channels along the temporal dimension; thus facilitate information exchanged among neighboring frames. It can be inserted into 2D CNNs to achieve temporal modeling at zero computation and zero parameters. We also extended TSM to online setting, which enables real-time low-latency online video recognition and video object detection. TSM is accurate and efficient: it ranks the first place on the Something-Something leaderboard upon publication; on Jetson Nano and Galaxy Note8, it achieves a low latency of 13ms and 35ms for online video recognition.

Results and Models

Kinetics-400

frame sampling strategy resolution gpus backbone pretrain top1 acc top5 acc testing protocol FLOPs params config ckpt log
1x1x8 224x224 8 ResNet50 ImageNet 73.18 90.56 8 clips x 10 crop 32.88G 23.87M config ckpt log
1x1x8 224x224 8 ResNet50 ImageNet 73.22 90.22 8 clips x 10 crop 32.88G 23.87M config ckpt log
1x1x16 224x224 8 ResNet50 ImageNet 75.12 91.55 16 clips x 10 crop 65.75G 23.87M config ckpt log
1x1x8 (dense) 224x224 8 ResNet50 ImageNet 73.38 90.78 8 clips x 10 crop 32.88G 23.87M config ckpt log
1x1x8 224x224 8 ResNet50 (NonLocalDotProduct) ImageNet 74.49 91.15 8 clips x 10 crop 61.30G 31.68M config ckpt log
1x1x8 224x224 8 ResNet50 (NonLocalGauss) ImageNet 73.66 90.99 8 clips x 10 crop 59.06G 28.00M config ckpt log
1x1x8 224x224 8 ResNet50 (NonLocalEmbedGauss) ImageNet 74.34 91.23 8 clips x 10 crop 61.30G 31.68M config ckpt log
1x1x8 224x224 8 MobileNetV2 ImageNet 68.71 88.32 8 clips x 3 crop 3.269G 2.736M config ckpt log
1x1x16 224x224 8 MobileOne-S4 ImageNet 74.38 91.71 16 clips x 10 crop 48.65G 13.72M config ckpt log

Something-something V2

frame sampling strategy resolution gpus backbone pretrain top1 acc top5 acc testing protocol FLOPs params config ckpt log
1x1x8 224x224 8 ResNet50 ImageNet 62.72 87.70 8 clips x 3 crop 32.88G 23.87M config ckpt log
1x1x16 224x224 8 ResNet50 ImageNet 64.16 88.61 16 clips x 3 crop 65.75G 23.87M config ckpt log
1x1x8 224x224 8 ResNet101 ImageNet 63.70 88.28 8 clips x 3 crop 62.66G 42.86M config ckpt log
  1. The gpus indicates the number of gpus we used to get the checkpoint. If you want to use a different number of gpus or videos per gpu, the best way is to set --auto-scale-lr when calling tools/train.py, this parameter will auto-scale the learning rate according to the actual batch size and the original batch size.
  2. The validation set of Kinetics400 we used consists of 19796 videos. These videos are available at Kinetics400-Validation. The corresponding data list (each line is of the format 'video_id, num_frames, label_index') and the label map are also available.
  3. MoibleOne backbone supports reparameterization during inference. You can use the provided reparameterize tool to convert the checkpoint and switch to the deploy config file.

For more details on data preparation, you can refer to Kinetics400.

Train

You can use the following command to train a model.

python tools/train.py ${CONFIG_FILE} [optional arguments]

Example: train TSM model on Kinetics-400 dataset in a deterministic option.

python tools/train.py configs/recognition/tsm/tsm_imagenet-pretrained-r50_8xb16-1x1x8-50e_kinetics400-rgb.py \
     --seed=0 --deterministic

For more details, you can refer to the Training part in the Training and Test Tutorial.

Test

You can use the following command to test a model.

python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments]

Example: test TSM model on Kinetics-400 dataset and dump the result to a pkl file.

python tools/test.py configs/recognition/tsm/tsm_imagenet-pretrained-r50_8xb16-1x1x8-50e_kinetics400-rgb.py \
    checkpoints/SOME_CHECKPOINT.pth --dump result.pkl

For more details, you can refer to the Test part in the Training and Test Tutorial.

Citation

@inproceedings{lin2019tsm,
  title={TSM: Temporal Shift Module for Efficient Video Understanding},
  author={Lin, Ji and Gan, Chuang and Han, Song},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision},
  year={2019}
}
@article{Nonlocal2018,
  author =   {Xiaolong Wang and Ross Girshick and Abhinav Gupta and Kaiming He},
  title =    {Non-local Neural Networks},
  journal =  {CVPR},
  year =     {2018}
}