2020 [arXiv]
Whats Unique It invents an unified language model for both autoencoding (BERT like) and partially autoregressive language (XLNet for Span) modeling tasks using a novel training procedure referred to as a pseudo masked language modelling.
How It Works
- Conventional masks to learn inter-relation between corrupted tokens and context via auto-encoding.
- Pseudo masks learn intra-relations between masked spans via partially auto-regressive modeling.
- Auto encoding objective remains conventional.
- Partially auto regressive objective lets psedo masked span attend to prior predicted tokens, and corresponding masked tokens.
Following table gives an overview of how Auto Encoding, Auto Regressive objective, and Partially Auto Regressive objective.
Following figure shows pseudo masked language models for Unified model pre-training. It appends input sequence with pseudo masked tokens as well as original tokens for masked ones. And, with attention mechanism it trains model with dual objectives at the same time.
Model
- Auto encoding loss
-
Partially Auto-regressive Modelling
-
Following figure illustrate the implementation details at attention mask level.