Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support UMT #2657

Merged
merged 12 commits into from
Sep 6, 2023
Merged

[Feature] Support UMT #2657

merged 12 commits into from
Sep 6, 2023

Conversation

Dai-Wenxun
Copy link
Collaborator

UMT Project

Unmasked Teacher: Towards Training-Efficient Video Foundation Models

Abstract

Video Foundation Models (VFMs) have received limited exploration due to high computational costs and data scarcity. Previous VFMs rely on Image Foundation Models (IFMs), which face challenges in transferring to the video domain. Although VideoMAE has trained a robust ViT from limited data, its low-level reconstruction poses convergence difficulties and conflicts with high-level cross-modal alignment. This paper proposes a training-efficient method for temporal-sensitive VFMs that integrates the benefits of existing methods. To increase data efficiency, we mask out most of the low-semantics video tokens, but selectively align the unmasked tokens with IFM, which serves as the UnMasked Teacher (UMT). By providing semantic guidance, our method enables faster convergence and multimodal friendliness. With a progressive pre-training framework, our model can handle various tasks including scene-related, temporal-related, and complex video-language understanding. Using only public sources for pre-training in 6 days on 32 A100 GPUs, our scratch-built ViT-L/16 achieves state-of-the-art performances on various video tasks.

Usage

Setup Environment

Please refer to Installation to install MMAction2.

Assume that you are located at $MMACTION2/projects/umt.

Add the current folder to PYTHONPATH, so that Python can find your code. Run the following command in the current directory to add it.

Please run it every time after you opened a new shell.

export PYTHONPATH=`pwd`:$PYTHONPATH

Data Preparation

Prepare the Kinetics dataset according to the instruction.

Create a symbolic link from $MMACTION2/data to ./data in the current directory, so that Python can locate your data. Run the following command in the current directory to create the symbolic link.

ln -s ../../data ./data

Testing commands

To test with single GPU:

mim test mmaction configs/umt-base-p16-res224_kinetics710-pre-ft_u8_k400-rgb.py --checkpoint $CHECKPOINT

To test with multiple GPUs:

mim test mmaction configs/umt-base-p16-res224_kinetics710-pre-ft_u8_k400-rgb.py --checkpoint $CHECKPOINT --launcher pytorch --gpus 8

To test with multiple GPUs by slurm:

mim test mmaction configs/umt-base-p16-res224_kinetics710-pre-ft_u8_k400-rgb.py --checkpoint $CHECKPOINT --launcher slurm \
    --gpus 8 --gpus-per-node 8 --partition $PARTITION

Results

Kinetics400

frame sampling strategy resolution backbone pretrain top1 acc testing protocol config ckpt
uniform 8 224x224 UMT-B Kinetics710 87.33 4 clips x 3 crop config ckpt
uniform 8 224x224 UMT-L Kinetics710 90.21 4 clips x 3 crop config ckpt

Kinetics700

frame sampling strategy resolution backbone pretrain top1 acc testing protocol config ckpt
uniform 8 224x224 UMT-B Kinetics710 77.95 4 clips x 3 crop config ckpt
uniform 8 224x224 UMT-L Kinetics710 82.79 4 clips x 3 crop config ckpt

Citation

@article{li2023unmasked,
  title={Unmasked teacher: Towards training-efficient video foundation models},
  author={Li, Kunchang and Wang, Yali and Li, Yizhuo and Wang, Yi and He, Yinan and Wang, Limin and Qiao, Yu},
  journal={arXiv preprint arXiv:2303.16058},
  year={2023}
}

@CLAassistant
Copy link

CLAassistant commented Aug 29, 2023

CLA assistant check
All committers have signed the CLA.

@Dai-Wenxun Dai-Wenxun changed the base branch from main to dev-1.x August 29, 2023 07:16
@codecov
Copy link

codecov bot commented Aug 29, 2023

Codecov Report

Patch coverage is 30.23% of modified lines.

❗ Current head 7317ea1 differs from pull request most recent head d5d6e00. Consider uploading reports for the commit d5d6e00 to get more accurate results

Files Changed Coverage
mmaction/apis/__init__.py ø
mmaction/apis/inference.py 14.70%
mmaction/models/losses/hvu_loss.py 50.00%
mmaction/datasets/transforms/formatting.py 100.00%
mmaction/datasets/transforms/loading.py 100.00%
mmaction/models/heads/tpn_head.py 100.00%

📢 Thoughts on this report? Let us know!.

@cir7 cir7 merged commit ed1270c into open-mmlab:dev-1.x Sep 6, 2023
8 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants