UNILMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

Hangbo Bao et al.

2020 [arXiv]

Whats Unique It invents an unified language model for both autoencoding (BERT like) and partially autoregressive language (XLNet for Span) modeling tasks using a novel training procedure referred to as a pseudo masked language modelling.

How It Works

Conventional masks to learn inter-relation between corrupted tokens and context via auto-encoding.
Pseudo masks learn intra-relations between masked spans via partially auto-regressive modeling.

Source: Author

Auto encoding objective remains conventional.
Partially auto regressive objective lets psedo masked span attend to prior predicted tokens, and corresponding masked tokens.

Following table gives an overview of how Auto Encoding, Auto Regressive objective, and Partially Auto Regressive objective.

Source: Author

Following figure shows pseudo masked language models for Unified model pre-training. It appends input sequence with pseudo masked tokens as well as original tokens for masked ones. And, with attention mechanism it trains model with dual objectives at the same time.

Source: Author

Model

Auto encoding loss

$\mathcal{L}_{\mathrm{AE}}=-\sum_{x \in \mathcal{D}} \log \prod_{m \in M} p\left(x_{m} \mid x_{\backslash M}\right)$

Partially Auto-regressive Modelling
- In each factorization step, a model can predict one or multiple tokens.
- Let M = < M1, M2, .. M_|M|> is factorization order, where M_i = {m_1, .., m_i}, or set of token span to be masked in factorisation step i
- $\begin{aligned} p\left(x_{M} \mid x_{\backslash M}\right) &=\prod_{i=1}^{|M|} p\left(x_{M_{i}} \mid x_{\backslash M_{\geq i}}\right) \\ &=\prod_{i=1}^{|M|} \prod_{m \in M_{i}} p\left(x_{m} \mid x_{\backslash M \geq i}\right. \end{aligned}$
- $\mathcal{L}_{\mathrm{PAR}}=-\sum_{x \in \mathcal{D}} \mathbb{E}_{M} \log p\left(x_{M} \mid x_{\backslash M}\right)$
Following figure illustrate the implementation details at attention mask level.

Source: Author

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unilmv2.md

unilmv2.md

UNILMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

Hangbo Bao et al.

2020 [arXiv]

Files

unilmv2.md

Latest commit

History

unilmv2.md

File metadata and controls

UNILMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

Hangbo Bao et al.

2020 [arXiv]