[Feature] Support UMT #2657

Dai-Wenxun · 2023-08-29T06:57:26Z

UMT Project

Unmasked Teacher: Towards Training-Efficient Video Foundation Models

Abstract

Video Foundation Models (VFMs) have received limited exploration due to high computational costs and data scarcity. Previous VFMs rely on Image Foundation Models (IFMs), which face challenges in transferring to the video domain. Although VideoMAE has trained a robust ViT from limited data, its low-level reconstruction poses convergence difficulties and conflicts with high-level cross-modal alignment. This paper proposes a training-efficient method for temporal-sensitive VFMs that integrates the benefits of existing methods. To increase data efficiency, we mask out most of the low-semantics video tokens, but selectively align the unmasked tokens with IFM, which serves as the UnMasked Teacher (UMT). By providing semantic guidance, our method enables faster convergence and multimodal friendliness. With a progressive pre-training framework, our model can handle various tasks including scene-related, temporal-related, and complex video-language understanding. Using only public sources for pre-training in 6 days on 32 A100 GPUs, our scratch-built ViT-L/16 achieves state-of-the-art performances on various video tasks.

Usage

Setup Environment

Please refer to Installation to install MMAction2.

Assume that you are located at $MMACTION2/projects/umt.

Add the current folder to PYTHONPATH, so that Python can find your code. Run the following command in the current directory to add it.

Please run it every time after you opened a new shell.

export PYTHONPATH=`pwd`:$PYTHONPATH

Data Preparation

Prepare the Kinetics dataset according to the instruction.

Create a symbolic link from $MMACTION2/data to ./data in the current directory, so that Python can locate your data. Run the following command in the current directory to create the symbolic link.

ln -s ../../data ./data

Testing commands

To test with single GPU:

mim test mmaction configs/umt-base-p16-res224_kinetics710-pre-ft_u8_k400-rgb.py --checkpoint $CHECKPOINT

To test with multiple GPUs:

mim test mmaction configs/umt-base-p16-res224_kinetics710-pre-ft_u8_k400-rgb.py --checkpoint $CHECKPOINT --launcher pytorch --gpus 8

To test with multiple GPUs by slurm:

mim test mmaction configs/umt-base-p16-res224_kinetics710-pre-ft_u8_k400-rgb.py --checkpoint $CHECKPOINT --launcher slurm \
    --gpus 8 --gpus-per-node 8 --partition $PARTITION

Results

Kinetics400

frame sampling strategy	resolution	backbone	pretrain	top1 acc	testing protocol	config	ckpt
uniform 8	224x224	UMT-B	Kinetics710	87.33	4 clips x 3 crop	config	ckpt
uniform 8	224x224	UMT-L	Kinetics710	90.21	4 clips x 3 crop	config	ckpt

Kinetics700

frame sampling strategy	resolution	backbone	pretrain	top1 acc	testing protocol	config	ckpt
uniform 8	224x224	UMT-B	Kinetics710	77.95	4 clips x 3 crop	config	ckpt
uniform 8	224x224	UMT-L	Kinetics710	82.79	4 clips x 3 crop	config	ckpt

Citation

@article{li2023unmasked,
  title={Unmasked teacher: Towards training-efficient video foundation models},
  author={Li, Kunchang and Wang, Yali and Li, Yizhuo and Wang, Yi and He, Yinan and Wang, Limin and Qiao, Yu},
  journal={arXiv preprint arXiv:2303.16058},
  year={2023}
}

CLAassistant · 2023-08-29T06:57:35Z

All committers have signed the CLA.

codecov · 2023-08-29T07:19:26Z

Codecov Report

Patch coverage is 30.23% of modified lines.

❗ Current head 7317ea1 differs from pull request most recent head d5d6e00. Consider uploading reports for the commit d5d6e00 to get more accurate results

Files Changed	Coverage
mmaction/apis/__init__.py	`ø`
mmaction/apis/inference.py	`14.70%`
mmaction/models/losses/hvu_loss.py	`50.00%`
mmaction/datasets/transforms/formatting.py	`100.00%`
mmaction/datasets/transforms/loading.py	`100.00%`
mmaction/models/heads/tpn_head.py	`100.00%`

📢 Thoughts on this report? Let us know!.

mm-assistant bot assigned Dai-Wenxun Aug 29, 2023

Dai-Wenxun and others added 11 commits August 29, 2023 15:06

first commit

477fe9a

fix

b212783

fix

acf54e3

fix config

676a62d

delete x.py

31cd51e

fix lint

1fde086

fix config

489c307

add large config

c99fc44

add k700 config

5ec4f6e

add readme

f5a4ca4

fix lint

b89ba40

Dai-Wenxun changed the base branch from main to dev-1.x August 29, 2023 07:16

fix cfg

d5d6e00

cir7 approved these changes Sep 6, 2023

View reviewed changes

cir7 merged commit ed1270c into open-mmlab:dev-1.x Sep 6, 2023
8 of 12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Support UMT #2657

[Feature] Support UMT #2657

Dai-Wenxun commented Aug 29, 2023

CLAassistant commented Aug 29, 2023 •

edited

Loading

codecov bot commented Aug 29, 2023 •

edited

Loading

[Feature] Support UMT #2657

[Feature] Support UMT #2657

Conversation

Dai-Wenxun commented Aug 29, 2023

UMT Project

Abstract

Usage

Setup Environment

Data Preparation

Testing commands

Results

Kinetics400

Kinetics700

Citation

CLAassistant commented Aug 29, 2023 • edited Loading

codecov bot commented Aug 29, 2023 • edited Loading

Codecov Report

CLAassistant commented Aug 29, 2023 •

edited

Loading

codecov bot commented Aug 29, 2023 •

edited

Loading