Skip to content

code and trained models for "Attention as Activation"

Notifications You must be signed in to change notification settings

YimianDai/open-atac

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Attention as Activation

MXNet/Gluon code for "Attention as Activation" https://arxiv.org/abs/2007.07729

What's in this repo so far:

  • Code for CIFAR-10 and CIFAR-100 experiments with a varying network depth
  • Code, trained model, and training log for ImageNet experiments

Requirements

Install MXNet and Gluon-CV:

pip install --upgrade mxnet-cu100 gluoncv

Experiments

ImageNet

Training script:

python train_imagenet.py --mode hybrid --lr 0.075 --lr-mode cosine --num-epochs 160 --batch-size 128 --num-gpus 2 -j 48 --warmup-epochs 5 --dtype float16 --use-rec --last-gamma --no-wd --label-smoothing --save-dir params_resnet50_v1b_ChaATAC_2 --logging-file resnet50_v1b_ChaATAC_2.log --r 2 --act-layers 2

The trained model params and training log are in ./params

Architecture GFlops Params top-1 err. top-5 err.
ResNet-50 [1] 3.86 25.6M 23.30 6.55
SE-ResNet-50 [2] 3.87 28.1M 22.12 5.99
AA-ResNet-50 [3] 8.3 25.8M 22.30 6.20
FA-ResNet-50 [4] 7.2 18.0M 22.40 /
GE-𝜽^+-ResNet-50 [5] 3.87 33.7M 21.88 5.80
ATAC-ResNet-50 (ours) 4.4 28.0M 21.41 6.02

CIFAR-10 and CIFAR-100

Training script:

python train_cifar.py --gpus 0 --num-epochs 400 --mode hybrid -j 32 --batch-size 128 --wd 0.0001 --lr 0.2 --lr-decay 0.1 --lr-decay-epoch 300,350 --dataset cifar100 --model atac --act-type ChaATAC --useReLU --r 2 --blocks 3

References

[1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun: Deep Residual Learning for Image Recognition. CVPR 2016: 770-778

[2] Jie Hu, Li Shen, Gang Sun: Squeeze-and-Excitation Networks. CVPR 2018: 7132-7141

[3] Irwan Bello, Barret Zoph, Quoc Le, Ashish Vaswani, Jonathon Shlens: Attention Augmented Convolutional Networks. ICCV 2019: 3285-3294

[4] Niki Parmar, Prajit Ramachandran, Ashish Vaswani, Irwan Bello, Anselm Levskaya, Jon Shlens: Stand-Alone Self-Attention in Vision Models. NeurIPS 2019: 68-80

[5] Jie Hu, Li Shen, Samuel Albanie, Gang Sun, Andrea Vedaldi: Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks. NeurIPS 2018: 9423-9433

About

code and trained models for "Attention as Activation"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages