SlimBERT - BERT Compression with Neural Slimming

I studied how to compress the BERT model in a structured pruning manner. I proposed the neural slimming technique to assess the importance of each neuron and designed the cost function and pruning strategy to remove neurons that make zero or less contribution to the prediction. After getting fine-tuned on the downstream tasks, the model can learn a more compact structure, and it is named SlimBERT. My thesis is available here.

Methods

To estimate the contribution of each neuron, we introduce a importance factor α which is a learnable parameter. A slim layer is a group of independent importance factors that are individually optimized. Each time we want to perform pruning on a certain layer, we connect the slim layer to this layer.

Slim Layer And Loss Function

Due to the flexibility of the slim layer, we can easily apply it to the parts that we want to prune in the model. For BERT, we use it on all the layers, including the embedding layer, the multi-head self-attention layers, and the fully connected layer.

SilmBERT Architecture

Results

We tested our method on 7 GLUE tasks and used only 10% of the original parameters to recover 94% of the original accuracy. It also reduced the run-time memory and increased the inference speed at the same time. Compared to knowledge distillation methods and other structured pruning methods, the proposed approach achieved better performance under different metrics with the same compression ratio. Moreover, our method also improved the interpretability of BERT. By analyzing neurons with a significant contribution, we can observe that BERT utilizes different components and subnetworks according to different tasks.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.idea		.idea
__pycache__		__pycache__
config		config
model		model
performance		performance
scripts		scripts
util		util
README.md		README.md
checkpoint.py		checkpoint.py
dataset.py		dataset.py
job.sh		job.sh
main.py		main.py
optimizer.py		optimizer.py
tokenization.py		tokenization.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SlimBERT - BERT Compression with Neural Slimming

Methods

Slim Layer And Loss Function

SilmBERT Architecture

Results

Performance on GLUE Tasks

Performance of SlimBERT vs Unstructured Pruning

Percentage of Remaining Neurons by Layers

Percentage of removed neurons in each sublayer

Important Attention Heads in SlimBERT

About

Releases

Packages

Languages

Zhou3983/SlimBERT

Folders and files

Latest commit

History

Repository files navigation

SlimBERT - BERT Compression with Neural Slimming

Methods

Slim Layer And Loss Function

SilmBERT Architecture

Results

Performance on GLUE Tasks

Performance of SlimBERT vs Unstructured Pruning

Percentage of Remaining Neurons by Layers

Percentage of removed neurons in each sublayer

Important Attention Heads in SlimBERT

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages